Project
Local LLMs
๐ฆ Fast, Private, Offline AI
Open Full Version๐ง Locally Hosted AI Chat
A private chat application powered by Google Gemma 4 26B, running entirely on a self-managed Oracle Cloud Ampere ARM instance. Every prompt and response stays on the server โ no external APIs, no third-party telemetry, no data ever leaves the machine.
๐๏ธ Model & Hardware
26B
Google Gemma 4 26B
A single, high-capability open-weight model selected for its strong reasoning, instruction-following, and multilingual quality
ARM
Ampere Altra ยท 4 cores
ARM64 server-grade CPU on Oracle Cloud Free Tier, optimized for sustained inference workloads
24GB
24 GB RAM
Sufficient headroom to load the quantized GGUF weights fully in memory for low-latency CPU inference
CPU
100% CPU Inference
No GPU required โ powered by llama.cpp with ARM NEON optimizations
๐ Features
- Fully offline inference, no external API required
- Hosted with NGINX + HTTPS on Oracle Cloud (Free Tier)
-
Powered by
llama.cppusing GGUF format - Interactive web UI built with Streamlit
- Low-latency responses and complete data privacy