Available on winget

Install BitLlama

Pure Rust LLM inference engine with 1.58-bit ternary support and Test-Time Training

Install with winget

winget install --id imonoonoko.BitLlama

Upgrade

winget upgrade --id imonoonoko.BitLlama

Uninstall

winget uninstall --id imonoonoko.BitLlama

About BitLlama

BitLlama is a Pure Rust LLM inference engine featuring 1.58-bit ternary quantization, Test-Time Training (TTT), Soul learning system, MCP server/client, and private RAG. Supports Llama, Gemma, Mistral, Qwen, and BitNet models. OpenAI-compatible API server included.

What's new in 1.0.0

v1.0.0 — Final Release BitLlama v1.0.0. Development complete. What is BitLlama? A Pure Rust LLM inference engine with Soul learning and hierarchical memory. - 7 model architectures: Llama-2/3, Gemma-2/3, Qwen2.5, Mistral, BitNet - Soul learning: LoRA fine-tuning from conversations - Memory system: 4-layer hierarchical memory + 7-stage Sleep consolidation - Desktop GUI: Tauri 2.0 + Svelte 5, Japanese/English i18n - Performance: 45.4 tok/s (7B), 90% of llama.cpp - 1121 tests, quality score 9.0/10 Changes since v0.16.0 - CJK memory search fix (character bigram fallback for Japanese queries) - Soul learning tests (warmup, chat template, VRAM guard) - Chat template application fix for GGUF tokenizer fallback - README/ROADMAP updated to reflect project completion Install # Homebrew brew tap imonoonoko/bitllama && brew install bitllama # winget winget install imonoonoko.BitLlama # Or download binaries below Built with Rust by @imonoonoko Full Changelog: v0.16.0...v1.0.0

Read release notes

Version history

Version	Updated	Notes
1.0.0	Unknown	v1.0.0 — Final Release BitLlama v1.0.0. Development complete. What is BitLlama? A Pure Rust LLM inference engine with Soul learning and hierarchical memory. - 7 model architectures: Llama-2/3, Gemma-2/3, Qwen2.5, Mistral...
0.16.0	Unknown	Full Changelog: v0.15.0...v0.16.0
0.15.0	Unknown	Release notes