Project

Local LLMs

🦙 Fast, Private, Offline AI

🧠 Locally Hosted AI Chat

A private chat application powered by Google Gemma 4 26B, running entirely on a self-managed Oracle Cloud Ampere ARM instance. Every prompt and response stays on the server — no external APIs, no third-party telemetry, no data ever leaves the machine.

🗂️ Model & Hardware

26B

Google Gemma 4 26B

A single, high-capability open-weight model selected for its strong reasoning, instruction-following, and multilingual quality

ARM

Ampere Altra · 4 cores

ARM64 server-grade CPU on Oracle Cloud Free Tier, optimized for sustained inference workloads

24GB

24 GB RAM

Sufficient headroom to load the quantized GGUF weights fully in memory for low-latency CPU inference

CPU

100% CPU Inference

No GPU required — powered by llama.cpp with ARM NEON optimizations

🔐 Features

Fully offline inference, no external API required
Hosted with NGINX + HTTPS on Oracle Cloud (Free Tier)
Powered by llama.cpp using GGUF format
Interactive web UI built with Streamlit
Low-latency responses and complete data privacy