Project

Local LLMs

๐Ÿฆ™ Fast, Private, Offline AI

Open Full Version

๐Ÿง  Locally Hosted AI Chat

A private chat application powered by Google Gemma 4 26B, running entirely on a self-managed Oracle Cloud Ampere ARM instance. Every prompt and response stays on the server โ€” no external APIs, no third-party telemetry, no data ever leaves the machine.

๐Ÿ—‚๏ธ Model & Hardware

26B

Google Gemma 4 26B

A single, high-capability open-weight model selected for its strong reasoning, instruction-following, and multilingual quality

ARM

Ampere Altra ยท 4 cores

ARM64 server-grade CPU on Oracle Cloud Free Tier, optimized for sustained inference workloads

24GB

24 GB RAM

Sufficient headroom to load the quantized GGUF weights fully in memory for low-latency CPU inference

CPU

100% CPU Inference

No GPU required โ€” powered by llama.cpp with ARM NEON optimizations

๐Ÿ” Features

  • Fully offline inference, no external API required
  • Hosted with NGINX + HTTPS on Oracle Cloud (Free Tier)
  • Powered by llama.cpp using GGUF format
  • Interactive web UI built with Streamlit
  • Low-latency responses and complete data privacy