OpenCode + ByteShape:
Run a Local Coding Agent on Your Machine

Published by Ali Hadi Zadeh • 2 April 2026 • Tutorial

Totally necessary demo

Yes, it built a Flappy Bird clone.

Look, it's not curing cancer. But once your local coding agent is running, it can tackle the actually useful stuff: refactoring, debugging, writing tests, building real features, all on your hardware, for free, forever. A silly game just happens to be a great one-prompt stress test.

Hover over the game and press Space, or just click it to start. Tap the game to start.

Hey! If you can open a terminal and follow steps, you're in the right place. This guide is for people who are new to LLMs, local inference, and all the jargon. We'll unpack it as we go.

Short version: ByteShape gives you the optimized model; LM Studio, llama.cpp, or Ollama runs it on your machine; and OpenCode provides the agentic coding interface, letting the model work through your instructions and interact directly with your code from the terminal. Three pieces, one workflow.

What on earth are we installing?

The architecture

ByteShape
ByteShape weights
optimized GGUF model files
LM Studio
install & serve
OR
llama.cpp
build & serve
OR
Ollama
pull & serve
OpenCode
coding agent · TUI

ByteShape ships optimized model weights (as GGUF files) tuned for speed and efficiency. You still need an inference engine to load those weights; that's LM Studio, llama.cpp, or Ollama. On top of the inference engine sits OpenCode, the coding agent that talks to your model over a local API.

What is OpenCode?

OpenCode is an open-source terminal UI (TUI) coding agent. It talks to your model over normal APIs, can use tools (edit files, run commands, etc.), and reads its config from ~/.config/opencode/opencode.jsonc. It's the pilot, not the engine. The engine is LM Studio, llama.cpp, or Ollama.

LM Studio vs llama.cpp vs Ollama — pick your vibe

All three inference engines load the same GGUF weights and expose a local HTTP API for OpenCode to talk to. The difference is in how much setup you want and how much control you need.

LM Studio llama.cpp Ollama
Vibe "Easy to use and runs everything" "I want every knob and dial" "I want it to Just Work"
Setup Install, browse & download model, load Build or use releases, tune flags Install, pull model, go
Best for Broad model compatibility with a friendly CLI KV cache tweaks, layers, server defaults Getting started fast (but not all models are supported)
  • LM Studio: Desktop app with a CLI (lms). Browse and download models interactively, load them with a fixed identifier for consistent API usage, and start an OpenAI-compatible server. Works on Mac, Linux, and Windows (WSL2).
  • llama.cpp: Lower-level, super capable. Its llama-server component speaks an OpenAI-style API and exposes CLI options for context size, batching, KV cache quantization, GPU layers, and more. If you love reading --help, you'll be happy here.
  • Ollama: Downloads models, keeps things tidy, exposes a local HTTP API. Minimal setup, but does not support the Qwen3.5 family yet, so it is only available as an inference engine for Coder 30B in this tutorial.

KV cache in one line: While generating text, the model remembers past tokens in a cache. Tuning how that cache is stored can save VRAM. llama.cpp tends to expose the most options here.

Configure your setup

Model
Engine
OS

Showing instructions for: make a selection above