OpenCode + ByteShape:
Run a Local Coding Agent on Your Machine
Published by Ali Hadi Zadeh • 2 April 2026 • Tutorial
Hey! If you can open a terminal and follow steps, you're in the right place. This guide is for people who are new to LLMs, local inference, and all the jargon. We'll unpack it as we go.
Short version: ByteShape gives you the optimized model; LM Studio, llama.cpp, or Ollama runs it on your machine; and OpenCode provides the agentic coding interface, letting the model work through your instructions and interact directly with your code from the terminal. Three pieces, one workflow.
What on earth are we installing?
The architecture
ByteShape ships optimized model weights (as GGUF files) tuned for speed and efficiency. You still need an inference engine to load those weights; that's LM Studio, llama.cpp, or Ollama. On top of the inference engine sits OpenCode, the coding agent that talks to your model over a local API.
What is OpenCode?
OpenCode is an open-source terminal UI (TUI) coding agent. It talks to your model
over normal APIs, can use tools (edit files, run commands, etc.), and reads its config from
~/.config/opencode/opencode.jsonc. It's the pilot, not the engine.
The engine is LM Studio, llama.cpp, or Ollama.
LM Studio vs llama.cpp vs Ollama — pick your vibe
All three inference engines load the same GGUF weights and expose a local HTTP API for OpenCode to talk to. The difference is in how much setup you want and how much control you need.
| LM Studio | llama.cpp | Ollama | |
|---|---|---|---|
| Vibe | "Easy to use and runs everything" | "I want every knob and dial" | "I want it to Just Work" |
| Setup | Install, browse & download model, load | Build or use releases, tune flags | Install, pull model, go |
| Best for | Broad model compatibility with a friendly CLI | KV cache tweaks, layers, server defaults | Getting started fast (but not all models are supported) |
-
LM Studio: Desktop app with a CLI (
lms). Browse and download models interactively, load them with a fixed identifier for consistent API usage, and start an OpenAI-compatible server. Works on Mac, Linux, and Windows (WSL2). -
llama.cpp: Lower-level, super capable. Its
llama-servercomponent speaks an OpenAI-style API and exposes CLI options for context size, batching, KV cache quantization, GPU layers, and more. If you love reading--help, you'll be happy here. - Ollama: Downloads models, keeps things tidy, exposes a local HTTP API. Minimal setup, but does not support the Qwen3.5 family yet, so it is only available as an inference engine for Coder 30B in this tutorial.
KV cache in one line: While generating text, the model remembers past tokens in a cache. Tuning how that cache is stored can save VRAM. llama.cpp tends to expose the most options here.
Configure your setup
Showing instructions for: make a selection above
Just give me the commands
Don't feel like reading the full step-by-step? Here's the whole thing condensed into a single copy-paste block. If anything breaks, scroll down to the detailed walkthrough for more context and troubleshooting help.
# Install Ollama
brew install ollama
# Start Ollama with 32K context
OLLAMA_CONTEXT_LENGTH=32768 ollama serve
# (in a new terminal) Pull the model
ollama pull hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf
# Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.zshrc
# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
"$schema": "https://opencode.ai/config.json",
"model": "ollama/hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf",
"provider": {
"ollama": {
"npm": "@ai-sdk/openai-compatible",
"name": "Ollama (local)",
"options": { "baseURL": "http://localhost:11434/v1" },
"models": {
"hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf": {
"name": "ByteShape Qwen3-Coder-30B IQ4_XS 4.20bpw"
}
}
}
}
}
EOF
# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Start Ollama with 32K context
OLLAMA_CONTEXT_LENGTH=32768 ollama serve
# (in a new terminal) Pull the model
ollama pull hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf
# Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.bashrc
# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
"$schema": "https://opencode.ai/config.json",
"model": "ollama/hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf",
"provider": {
"ollama": {
"npm": "@ai-sdk/openai-compatible",
"name": "Ollama (local)",
"options": { "baseURL": "http://localhost:11434/v1" },
"models": {
"hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf": {
"name": "ByteShape Qwen3-Coder-30B IQ4_XS 4.20bpw"
}
}
}
}
}
EOF
# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Start Ollama with 32K context
OLLAMA_CONTEXT_LENGTH=32768 ollama serve
# (in a new terminal) Pull the model
ollama pull hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-Q3_K_M-3.31bpw.gguf
# Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.bashrc
# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
"$schema": "https://opencode.ai/config.json",
"model": "ollama/hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-Q3_K_M-3.31bpw.gguf",
"provider": {
"ollama": {
"npm": "@ai-sdk/openai-compatible",
"name": "Ollama (local)",
"options": { "baseURL": "http://localhost:11434/v1" },
"models": {
"hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-Q3_K_M-3.31bpw.gguf": {
"name": "ByteShape Qwen3-Coder-30B Q3_K_M 3.31bpw"
}
}
}
}
}
EOF
# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode
# Install build tools
xcode-select --install
brew install cmake
# Build llama.cpp
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build --config Release -j
# Start the server (downloads model on first run)
./build/bin/llama-server \
-hf byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf \
--host 127.0.0.1 --port 8080 \
-ngl 99 -c 32768 \
--repeat-penalty 1.05 --temp 0.7 --top-k 20 --top-p 0.8
# (in a new terminal) Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.zshrc
# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
"$schema": "https://opencode.ai/config.json",
"model": "llama/byteshape-iq4xs-4.20bpw",
"provider": {
"llama": {
"npm": "@ai-sdk/openai-compatible",
"name": "llama-server (local)",
"options": { "baseURL": "http://127.0.0.1:8080/v1" },
"models": {
"byteshape-iq4xs-4.20bpw": {
"name": "ByteShape Qwen3-Coder-30B IQ4_XS 4.20bpw"
}
}
}
}
}
EOF
# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode
# Install build tools
xcode-select --install
brew install cmake
# Build llama.cpp
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build --config Release -j
# Start the server (downloads model on first run)
./build/bin/llama-server \
-hf byteshape/Qwen3.5-9B-GGUF:Qwen3.5-9B-IQ4_XS-4.43bpw.gguf \
--host 127.0.0.1 --port 8080 \
-ngl 99 -c 32768 \
--temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 0 --repeat-penalty 1
# (in a new terminal) Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.zshrc
# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
"$schema": "https://opencode.ai/config.json",
"model": "llama/byteshape-q35-9b-iq4xs-4.43bpw",
"provider": {
"llama": {
"npm": "@ai-sdk/openai-compatible",
"name": "llama-server (local)",
"options": { "baseURL": "http://127.0.0.1:8080/v1" },
"models": {
"byteshape-q35-9b-iq4xs-4.43bpw": {
"name": "ByteShape Qwen3.5-9B IQ4_XS 4.43bpw"
}
}
}
}
}
EOF
# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode
# Install build tools
sudo apt update
sudo apt install -y build-essential git cmake
# Build llama.cpp with CUDA
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j
# Start the server (downloads model on first run)
./build/bin/llama-server \
-hf byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf \
--host 127.0.0.1 --port 8080 \
-ngl 99 -c 32768 \
--repeat-penalty 1.05 --temp 0.7 --top-k 20 --top-p 0.8
# (in a new terminal) Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.bashrc
# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
"$schema": "https://opencode.ai/config.json",
"model": "llama/byteshape-iq4xs-4.20bpw",
"provider": {
"llama": {
"npm": "@ai-sdk/openai-compatible",
"name": "llama-server (local)",
"options": { "baseURL": "http://127.0.0.1:8080/v1" },
"models": {
"byteshape-iq4xs-4.20bpw": {
"name": "ByteShape Qwen3-Coder-30B IQ4_XS 4.20bpw"
}
}
}
}
}
EOF
# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode
# Install build tools
sudo apt update
sudo apt install -y build-essential git cmake
# Build llama.cpp with CUDA
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j
# Start the server (downloads model on first run)
./build/bin/llama-server \
-hf byteshape/Qwen3.5-9B-GGUF:Qwen3.5-9B-IQ4_XS-4.43bpw.gguf \
--host 127.0.0.1 --port 8080 \
-ngl 99 -c 32768 \
--temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 0 --repeat-penalty 1
# (in a new terminal) Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.bashrc
# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
"$schema": "https://opencode.ai/config.json",
"model": "llama/byteshape-q35-9b-iq4xs-4.43bpw",
"provider": {
"llama": {
"npm": "@ai-sdk/openai-compatible",
"name": "llama-server (local)",
"options": { "baseURL": "http://127.0.0.1:8080/v1" },
"models": {
"byteshape-q35-9b-iq4xs-4.43bpw": {
"name": "ByteShape Qwen3.5-9B IQ4_XS 4.43bpw"
}
}
}
}
}
EOF
# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode
# Install build tools
sudo apt update
sudo apt install -y build-essential git cmake
# Build llama.cpp (CPU)
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build --config Release -j
# Start the server (downloads model on first run)
./build/bin/llama-server \
-hf byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-Q3_K_M-3.31bpw.gguf \
--host 127.0.0.1 --port 8080 \
-c 32768 \
--repeat-penalty 1.05 --temp 0.7 --top-k 20 --top-p 0.8
# (in a new terminal) Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.bashrc
# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
"$schema": "https://opencode.ai/config.json",
"model": "llama/byteshape-q3km-3.31bpw",
"provider": {
"llama": {
"npm": "@ai-sdk/openai-compatible",
"name": "llama-server (local)",
"options": { "baseURL": "http://127.0.0.1:8080/v1" },
"models": {
"byteshape-q3km-3.31bpw": {
"name": "ByteShape Qwen3-Coder-30B Q3_K_M 3.31bpw"
}
}
}
}
}
EOF
# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode
# Install build tools
sudo apt update
sudo apt install -y build-essential git cmake
# Build llama.cpp (CPU)
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build --config Release -j
# Start the server (downloads model on first run)
./build/bin/llama-server \
-hf byteshape/Qwen3.5-9B-GGUF:Qwen3.5-9B-IQ4_XS-4.20bpw.gguf \
--host 127.0.0.1 --port 8080 \
-c 32768 \
--temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 0 --repeat-penalty 1
# (in a new terminal) Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.bashrc
# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
"$schema": "https://opencode.ai/config.json",
"model": "llama/byteshape-q35-9b-iq4xs-4.20bpw",
"provider": {
"llama": {
"npm": "@ai-sdk/openai-compatible",
"name": "llama-server (local)",
"options": { "baseURL": "http://127.0.0.1:8080/v1" },
"models": {
"byteshape-q35-9b-iq4xs-4.20bpw": {
"name": "ByteShape Qwen3.5-9B IQ4_XS 4.20bpw"
}
}
}
}
}
EOF
# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode
# Install LM Studio CLI
curl -fsSL https://lmstudio.ai/install.sh | bash
export PATH="$HOME/.lmstudio/bin:$PATH"
# Start daemon and download model (interactive selection)
lms daemon up
lms get byteshape
# → Select: byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF
# → Pick your preferred quant
# Load with a fixed identifier (Metal GPU auto-detected on Mac)
lms load qwen3-coder-30b --context-length 32768 --identifier "byteshape-qwen3-coder-30b" --gpu max
# Start the API server
lms server start
# Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.zshrc
# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"lmstudio": {
"name": "LM Studio",
"npm": "@ai-sdk/openai-compatible",
"options": {
"baseURL": "http://127.0.0.1:1234/v1",
"apiKey": "dummy"
},
"models": {
"byteshape-qwen3-coder-30b": {
"name": "byteshape-qwen3-coder-30b"
}
}
}
},
"model": "lmstudio/byteshape-qwen3-coder-30b",
"agent": {
"build": {
"model": "lmstudio/byteshape-qwen3-coder-30b",
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"repeat_penalty": 1.05
},
"plan": {
"model": "lmstudio/byteshape-qwen3-coder-30b",
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"repeat_penalty": 1.05
}
}
}
EOF
# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode
# Install LM Studio CLI
curl -fsSL https://lmstudio.ai/install.sh | bash
export PATH="$HOME/.lmstudio/bin:$PATH"
# Start daemon and download model (interactive selection)
lms daemon up
lms get byteshape
# → Select: byteshape/Qwen3.5-9B-GGUF
# → Pick your preferred quant
# Load with a fixed identifier (Metal GPU auto-detected on Mac)
lms load qwen3.5-9b --context-length 32768 --identifier "byteshape-qwen3.5-9b" --gpu max
# Start the API server
lms server start
# Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.zshrc
# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"lmstudio": {
"name": "LM Studio",
"npm": "@ai-sdk/openai-compatible",
"options": {
"baseURL": "http://127.0.0.1:1234/v1",
"apiKey": "dummy"
},
"models": {
"byteshape-qwen3.5-9b": {
"name": "byteshape-qwen3.5-9b"
}
}
}
},
"model": "lmstudio/byteshape-qwen3.5-9b",
"agent": {
"build": {
"model": "lmstudio/byteshape-qwen3.5-9b",
"temperature": 0.6,
"top_p": 0.95,
"top_k": 20,
"min_p": 0,
"presence_penalty": 0,
"repeat_penalty": 1
},
"plan": {
"model": "lmstudio/byteshape-qwen3.5-9b",
"temperature": 0.6,
"top_p": 0.95,
"top_k": 20,
"min_p": 0,
"presence_penalty": 0,
"repeat_penalty": 1
}
}
}
EOF
# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode
# Install dependencies
sudo apt-get update && sudo apt-get install -y libatomic1 libgomp1 curl
# Install LM Studio CLI
curl -fsSL https://lmstudio.ai/install.sh | bash
export PATH="$HOME/.lmstudio/bin:$PATH"
# Start daemon and download model (interactive selection)
lms daemon up
lms get byteshape
# → Select: byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF
# → Pick your preferred quant
# Load with GPU offload and a fixed identifier
lms load qwen3-coder-30b --context-length 32768 --identifier "byteshape-qwen3-coder-30b" --gpu max
# Start the API server
lms server start
# Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.bashrc
# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"lmstudio": {
"name": "LM Studio",
"npm": "@ai-sdk/openai-compatible",
"options": {
"baseURL": "http://127.0.0.1:1234/v1",
"apiKey": "dummy"
},
"models": {
"byteshape-qwen3-coder-30b": {
"name": "byteshape-qwen3-coder-30b"
}
}
}
},
"model": "lmstudio/byteshape-qwen3-coder-30b",
"agent": {
"build": {
"model": "lmstudio/byteshape-qwen3-coder-30b",
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"repeat_penalty": 1.05
},
"plan": {
"model": "lmstudio/byteshape-qwen3-coder-30b",
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"repeat_penalty": 1.05
}
}
}
EOF
# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode
# Install dependencies
sudo apt-get update && sudo apt-get install -y libatomic1 libgomp1 curl
# Install LM Studio CLI
curl -fsSL https://lmstudio.ai/install.sh | bash
export PATH="$HOME/.lmstudio/bin:$PATH"
# Start daemon and download model (interactive selection)
lms daemon up
lms get byteshape
# → Select: byteshape/Qwen3.5-9B-GGUF
# → Pick your preferred quant
# Load with GPU offload and a fixed identifier
lms load qwen3.5-9b --context-length 32768 --identifier "byteshape-qwen3.5-9b" --gpu max
# Start the API server
lms server start
# Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.bashrc
# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"lmstudio": {
"name": "LM Studio",
"npm": "@ai-sdk/openai-compatible",
"options": {
"baseURL": "http://127.0.0.1:1234/v1",
"apiKey": "dummy"
},
"models": {
"byteshape-qwen3.5-9b": {
"name": "byteshape-qwen3.5-9b"
}
}
}
},
"model": "lmstudio/byteshape-qwen3.5-9b",
"agent": {
"build": {
"model": "lmstudio/byteshape-qwen3.5-9b",
"temperature": 0.6,
"top_p": 0.95,
"top_k": 20,
"min_p": 0,
"presence_penalty": 0,
"repeat_penalty": 1
},
"plan": {
"model": "lmstudio/byteshape-qwen3.5-9b",
"temperature": 0.6,
"top_p": 0.95,
"top_k": 20,
"min_p": 0,
"presence_penalty": 0,
"repeat_penalty": 1
}
}
}
EOF
# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode
# Install dependencies
sudo apt-get update && sudo apt-get install -y libatomic1 libgomp1 curl
# Install LM Studio CLI
curl -fsSL https://lmstudio.ai/install.sh | bash
export PATH="$HOME/.lmstudio/bin:$PATH"
# Start daemon and download model (interactive selection)
lms daemon up
lms get byteshape
# → Select: byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF
# → Pick your preferred quant
# Load (CPU, no --gpu flag)
lms load qwen3-coder-30b --context-length 32768 --identifier "byteshape-qwen3-coder-30b"
# Start the API server
lms server start
# Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.bashrc
# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"lmstudio": {
"name": "LM Studio",
"npm": "@ai-sdk/openai-compatible",
"options": {
"baseURL": "http://127.0.0.1:1234/v1",
"apiKey": "dummy"
},
"models": {
"byteshape-qwen3-coder-30b": {
"name": "byteshape-qwen3-coder-30b"
}
}
}
},
"model": "lmstudio/byteshape-qwen3-coder-30b",
"agent": {
"build": {
"model": "lmstudio/byteshape-qwen3-coder-30b",
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"repeat_penalty": 1.05
},
"plan": {
"model": "lmstudio/byteshape-qwen3-coder-30b",
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"repeat_penalty": 1.05
}
}
}
EOF
# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode
# Install dependencies
sudo apt-get update && sudo apt-get install -y libatomic1 libgomp1 curl
# Install LM Studio CLI
curl -fsSL https://lmstudio.ai/install.sh | bash
export PATH="$HOME/.lmstudio/bin:$PATH"
# Start daemon and download model (interactive selection)
lms daemon up
lms get byteshape
# → Select: byteshape/Qwen3.5-9B-GGUF
# → Pick your preferred quant
# Load (CPU, no --gpu flag)
lms load qwen3.5-9b --context-length 32768 --identifier "byteshape-qwen3.5-9b"
# Start the API server
lms server start
# Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.bashrc
# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"lmstudio": {
"name": "LM Studio",
"npm": "@ai-sdk/openai-compatible",
"options": {
"baseURL": "http://127.0.0.1:1234/v1",
"apiKey": "dummy"
},
"models": {
"byteshape-qwen3.5-9b": {
"name": "byteshape-qwen3.5-9b"
}
}
}
},
"model": "lmstudio/byteshape-qwen3.5-9b",
"agent": {
"build": {
"model": "lmstudio/byteshape-qwen3.5-9b",
"temperature": 0.6,
"top_p": 0.95,
"top_k": 20,
"min_p": 0,
"presence_penalty": 0,
"repeat_penalty": 1
},
"plan": {
"model": "lmstudio/byteshape-qwen3.5-9b",
"temperature": 0.6,
"top_p": 0.95,
"top_k": 20,
"min_p": 0,
"presence_penalty": 0,
"repeat_penalty": 1
}
}
}
EOF
# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode
Something not working? Check the step-by-step below for more details.
Part 1: Set up your environment
First: Set up WSL2
For this tutorial, everything on Windows happens inside WSL2 (Windows Subsystem for Linux). LM Studio, llama.cpp, Ollama, and OpenCode all run inside an Ubuntu terminal, so once WSL2 is set up, the Linux instructions apply to you too.
Step 1
Open PowerShell as Administrator and run:
wsl --install
This installs WSL2 with Ubuntu by default. If you already have WSL but it's version 1, upgrade it:
wsl --set-default-version 2
Step 2
Restart your computer when prompted.
Step 3
After reboot, Ubuntu will launch and ask you to create a username and password. This is your Linux user inside WSL, totally separate from your Windows login.
Step 4
Once you're at the Ubuntu prompt, update packages:
sudo apt update && sudo apt upgrade -y
You now have a real Linux terminal. Everything below runs right here.
Install Nvidia drivers on Linux
CPU-only works, but big models get snail-like without GPU acceleration. Install a recent proprietary driver:
Open Software & Updates → Additional Drivers → pick the recommended NVIDIA driver → click Apply Changes → reboot.
Then verify in a terminal:
nvidia-smi
If you see your GPU and driver version listed, you're set.
Nvidia GPU passthrough for WSL2
The NVIDIA driver is installed on the Windows side, not inside WSL. WSL2 picks it up automatically once it's there.
- Download from nvidia.com/Download, or use GeForce Experience / Studio Driver channel.
- Reboot if the installer asks.
- After reboot, open your WSL2 Ubuntu terminal and run
nvidia-smi. If you see your GPU listed, GPU passthrough is working.
Make sure you're on Windows 10 21H2+ or Windows 11, and that you installed the Windows GPU driver
and not a Linux .run driver inside WSL, which will break things.
Install Ollama on macOS
Choose either option:
# Option A: Homebrew
brew install ollama
# Option B: Download the app
# Visit https://ollama.com/download and run the installer
After installing, Ollama runs as a menu bar app. We'll start it properly with the right context length in Part 2.
Install Ollama on Linux
curl -fsSL https://ollama.com/install.sh | sh
That's the install done. We'll start the server with the right context length in Part 2.
Install Ollama inside WSL2
Inside your Ubuntu WSL terminal, run the Linux install script:
curl -fsSL https://ollama.com/install.sh | sh
That's the install done. We'll start the server with the right context length in Part 2.
llama.cpp is the open-source inference engine we'll build from source.
It ships a component called llama-server that exposes an OpenAI-compatible
HTTP API; that's the piece OpenCode talks to. In the steps below we build the whole
project, then use llama-server to load and serve the model.
Build llama.cpp on macOS
Step A: Install build tools
Install Xcode Command Line Tools (gives you git, clang, make):
xcode-select --install
Then install CMake via Homebrew (easiest):
brew install cmake
No Homebrew? Download a macOS binary from
cmake.org/download
and put cmake on your PATH.
Build llama.cpp
Step A: Install build tools
sudo apt update
sudo apt install -y build-essential git cmake
Step B: Get the source
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
Step C: Build
On macOS, Metal (Apple GPU acceleration) is enabled by default, so the same CMake commands below give you GPU offload without extra flags.
cmake -B build
cmake --build build --config Release -j
Check the binary:
./build/bin/llama-server --help
A wall of flags = you're golden.
Step C: Build (CPU)
cmake -B build
cmake --build build --config Release -j
The -j flag builds in parallel (much faster on multi-core machines).
Check the binary:
./build/bin/llama-server --help
If you see a wall of flags and options, you're all set.
Step C: Install CUDA Toolkit
You need the
CUDA Toolkit
installed so nvcc is available. On WSL2, install it inside the Ubuntu environment
(not on the Windows side), and when downloading from Nvidia's website, make sure you select
Linux as the operating system, not Windows.
After the installer finishes, nvcc must be on your PATH for the
build to find it:
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Confirm it works:
nvcc --version
Verify nvidia-smi shows your GPU before continuing, then:
Step D: Build with CUDA
Delete any old build directory first, then:
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j
If CMake can't detect your GPU architecture, see the
CMAKE_CUDA_ARCHITECTURES section in docs/build.md.
Check the binary:
./build/bin/llama-server --help
Install LM Studio on macOS
Step A: Install LM Studio
curl -fsSL https://lmstudio.ai/install.sh | bash
Restart your shell or add the CLI to your current session:
export PATH="$HOME/.lmstudio/bin:$PATH"
Verify:
lms --version
Install LM Studio on Linux
Step A: Install dependencies
LM Studio needs a few system libraries. Install them first:
sudo apt-get update && sudo apt-get install -y libatomic1 libgomp1 curl
lms complains about
missing libraries later, come back and install them.
Step B: Install LM Studio
curl -fsSL https://lmstudio.ai/install.sh | bash
Restart your shell or add the CLI to your current session:
export PATH="$HOME/.lmstudio/bin:$PATH"
Verify:
lms --version
Install LM Studio inside WSL2
Step A: Install dependencies
Inside your Ubuntu WSL terminal:
sudo apt-get update && sudo apt-get install -y libatomic1 libgomp1 curl
Step B: Install LM Studio
curl -fsSL https://lmstudio.ai/install.sh | bash
Restart your shell or add the CLI to your current session:
export PATH="$HOME/.lmstudio/bin:$PATH"
Verify:
lms --version
Part 2: Download and run your ByteShape model
Which model file?
Picking the right quantization for your setup is something of an art. Every combination of hardware, inference engine, and data type behaves differently, and the "best" quant is the one that gives you the highest quality at a speed you can actually work with.
One of ByteShape's contributions is that we rigorously evaluate the quality and performance of every single quantization we publish, and provide interactive graphs so you can find the sweet spot for your exact hardware.
Head over to our Qwen3-Coder-30B analysis to explore the full results. The general rule: pick a hardware chart that matches your system, then choose the highest-quality quant that still runs at a comfortable speed.
For this tutorial we've picked an example model to keep things concrete:
| Setup | Model file | Size |
|---|---|---|
| GPU (Nvidia / Apple Silicon) | Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf |
16 GB |
| CPU-only | Qwen3-Coder-30B-A3B-Instruct-Q3_K_M-3.31bpw.gguf |
12.7 GB |
You should be able to drop in any other quant from the ByteShape repo and follow the same instructions; just swap the filename.
Head over to our Qwen3.5-9B analysis to explore the full results. The general rule: pick a hardware chart that matches your system, then choose the highest-quality quant that still runs at a comfortable speed.
For this tutorial we've picked an example model to keep things concrete:
| Setup | Model file | Size |
|---|---|---|
| GPU (Nvidia / Apple Silicon) | Qwen3.5-9B-IQ4_XS-4.43bpw.gguf |
4.97 GB |
| CPU-only | Qwen3.5-9B-IQ4_XS-4.20bpw.gguf |
4.71 GB |
You should be able to drop in any other quant from the ByteShape repo and follow the same instructions; just swap the filename.
Use the full filename. The instructions below use the full string including the model base name and bpw. LM Studio, llama.cpp, and Ollama all need this exact format to resolve the file.
Pull and run with Ollama
Step 1: Start Ollama
What is context length? It's how much text the model can "see" at once: your prompt, the conversation history, open files, and tool results all count against this budget. Coding agents are heavy consumers: OpenCode's system prompt alone is roughly 10K tokens, so that budget is spoken for before you even type your first message. 32K is a reasonable starting point, but you may need more for serious work on larger repositories. Adjust based on your available memory and workload.
The easiest way on Mac is the Ollama menu bar app. Open its Settings and drag the context length slider to 32K (or higher if your system can handle it). If you prefer the terminal, run:
OLLAMA_CONTEXT_LENGTH=32768 ollama serve
Either way sets the per-request ceiling for every model Ollama serves. See the Ollama context length docs for more details.
Start Ollama in a dedicated terminal and keep it running in the background:
OLLAMA_CONTEXT_LENGTH=32768 ollama serve
What is context length? It's how much text the model can "see" at once: your prompt,
the conversation history, open files, and tool results all count against this budget.
Coding agents are heavy consumers: OpenCode's system prompt alone is roughly 10K tokens,
so that budget is spoken for before you even type your first message. 32K is a reasonable
starting point, but you may need more for serious work on larger repositories. Adjust
based on your available memory and workload.
The OLLAMA_CONTEXT_LENGTH variable sets the per-request ceiling for every model
Ollama serves, so you won't need to set it anywhere else.
Step 2: Pull the model
Use the hf.co/ prefix so Ollama fetches directly from Hugging Face:
ollama pull hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf
ollama pull hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-Q3_K_M-3.31bpw.gguf
Step 3: Verify
ollama list
The model name appears exactly as you typed after hf.co/. Keep that string handy for the config.
Start llama-server (GPU)
llama-server downloads the model on first run, no manual download needed.
The -hf argument follows the form owner/repo:filename.
./build/bin/llama-server \
-hf byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf \
--host 127.0.0.1 --port 8080 \
-ngl 99 -c 32768 \
--repeat-penalty 1.05 --temp 0.7 --top-k 20 --top-p 0.8
./build/bin/llama-server \
-hf byteshape/Qwen3.5-9B-GGUF:Qwen3.5-9B-IQ4_XS-4.43bpw.gguf \
--host 127.0.0.1 --port 8080 \
-ngl 99 -c 32768 \
--temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 0 --repeat-penalty 1
-ngl 99offloads all layers to the GPU. Lower this number if you run out of VRAM.-
-c 32768is the context window. This controls how much text the model can hold in view at once: prompt, history, open files, and tool results all count. OpenCode's system prompt alone is ~10K tokens, so 32K is a reasonable starting point, but for larger repositories you'll want more. If you have spare VRAM, bump this to 64K or higher. -
Sampling flags (
--repeat-penalty,--temp,--top-k,--top-p) shape how the model picks its next token. These values are tuned for instruction-following and code tasks. Leave them as-is unless you have a specific reason to change them. - First run downloads the model to a local cache. Subsequent starts are instant.
- One model per server process. To switch, stop the server and restart with a different
-hfvalue.
Start llama-server (CPU)
llama-server downloads the model on first run, no manual download needed.
The -hf argument follows the form owner/repo:filename.
./build/bin/llama-server \
-hf byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-Q3_K_M-3.31bpw.gguf \
--host 127.0.0.1 --port 8080 \
-c 32768 \
--repeat-penalty 1.05 --temp 0.7 --top-k 20 --top-p 0.8
./build/bin/llama-server \
-hf byteshape/Qwen3.5-9B-GGUF:Qwen3.5-9B-IQ4_XS-4.20bpw.gguf \
--host 127.0.0.1 --port 8080 \
-c 32768 \
--temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 0 --repeat-penalty 1
-
-c 32768is the context window. This controls how much text the model can hold in view at once: prompt, history, open files, and tool results all count. OpenCode's system prompt alone is ~10K tokens, so 32K is a reasonable starting point, but for larger repositories you'll want more. -
Sampling flags (
--repeat-penalty,--temp,--top-k,--top-p) shape how the model picks its next token. These values are tuned for instruction-following and code tasks. Leave them as-is unless you have a specific reason to change them. - First run downloads the model to a local cache. Subsequent starts are instant.
- One model per server process. To switch, stop the server and restart with a different
-hfvalue.
Download and load with LM Studio
Step 1: Start the LM Studio daemon
lms daemon up
Step 2: Download a ByteShape model
The lms get command lets you browse and download models interactively.
Run it with the ByteShape publisher name:
lms get byteshape
This shows all available ByteShape models. Use the arrow keys to select your model, then pick a quantization.
Select byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF, then choose a quant that fits your hardware.
Select byteshape/Qwen3.5-9B-GGUF, then choose a quant that fits your hardware.
Step 3: Load the model
After download, load the model with a fixed identifier. The identifier gives the model a stable API name so your OpenCode config stays the same even if you switch quants later.
Important: If you downloaded multiple variants of the same model, the load command
may need a tag to disambiguate. If only one variant was downloaded, the short name works.
Check lms ls to see what's available.
lms load qwen3-coder-30b --context-length 32768 --identifier "byteshape-qwen3-coder-30b" --gpu max
lms load qwen3.5-9b --context-length 32768 --identifier "byteshape-qwen3.5-9b" --gpu max
lms load qwen3-coder-30b --context-length 32768 --identifier "byteshape-qwen3-coder-30b"
lms load qwen3.5-9b --context-length 32768 --identifier "byteshape-qwen3.5-9b"
--context-length 32768sets the context window. 32K is a starting point; increase it if your system has the memory and your projects are large.--identifiergives the model a stable name for the API. Your OpenCode config references this identifier.--gpu maxoffloads all layers to the GPU. Omit this flag for CPU-only inference.
Step 4: Verify the model is loaded
lms ps
You should see your model listed with its identifier.
Step 5: Start the API server
lms server start
LM Studio now serves an OpenAI-compatible API on http://127.0.0.1:1234/v1.
Keep this running while you use OpenCode.
Part 3: Install OpenCode and configure your model
Install OpenCode
curl -fsSL https://opencode.ai/install | bash
Restart your terminal (or run source ~/.bashrc / source ~/.zshrc)
so the opencode command is on your PATH. Quick check:
opencode --version
If a version number prints, you're good.
Configure OpenCode for Ollama
OpenCode reads its config from ~/.config/opencode/opencode.jsonc.
The config has two main pieces:
-
provider: tells OpenCode how to reach your inference server and what models it offers. You give it a short label (provider_id). -
model: the default model, written asprovider_id/model_id. Both parts must match what you defined inprovider.
{
"$schema": "https://opencode.ai/config.json",
"model": "ollama/hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf",
"provider": {
"ollama": {
"npm": "@ai-sdk/openai-compatible",
"name": "Ollama (local)",
"options": {
"baseURL": "http://localhost:11434/v1"
},
"models": {
"hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf": {
"name": "ByteShape Qwen3-Coder-30B IQ4_XS 4.20bpw"
}
}
}
}
}
{
"$schema": "https://opencode.ai/config.json",
"model": "ollama/hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-Q3_K_M-3.31bpw.gguf",
"provider": {
"ollama": {
"npm": "@ai-sdk/openai-compatible",
"name": "Ollama (local)",
"options": {
"baseURL": "http://localhost:11434/v1"
},
"models": {
"hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-Q3_K_M-3.31bpw.gguf": {
"name": "ByteShape Qwen3-Coder-30B Q3_K_M 3.31bpw"
}
}
}
}
}
To save this as your config file, run:
mkdir -p ~/.config/opencode
nano ~/.config/opencode/opencode.jsonc
This opens a terminal text editor called nano with an empty file. Paste the config above, then press Ctrl+O and Enter to save, followed by Ctrl+X to exit.
Configure OpenCode for llama.cpp
OpenCode reads its config from ~/.config/opencode/opencode.jsonc.
The config has two main pieces:
-
provider: tells OpenCode how to reach your inference server and what models it offers. You give it a short label (provider_id). -
model: the default model, written asprovider_id/model_id. Both parts must match what you defined inprovider.
{
"$schema": "https://opencode.ai/config.json",
"model": "llama/byteshape-iq4xs-4.20bpw",
"provider": {
"llama": {
"npm": "@ai-sdk/openai-compatible",
"name": "llama-server (local)",
"options": { "baseURL": "http://127.0.0.1:8080/v1" },
"models": {
"byteshape-iq4xs-4.20bpw": {
"name": "ByteShape Qwen3-Coder-30B IQ4_XS 4.20bpw"
}
}
}
}
}
{
"$schema": "https://opencode.ai/config.json",
"model": "llama/byteshape-q35-9b-iq4xs-4.43bpw",
"provider": {
"llama": {
"npm": "@ai-sdk/openai-compatible",
"name": "llama-server (local)",
"options": { "baseURL": "http://127.0.0.1:8080/v1" },
"models": {
"byteshape-q35-9b-iq4xs-4.43bpw": {
"name": "ByteShape Qwen3.5-9B IQ4_XS 4.43bpw"
}
}
}
}
}
{
"$schema": "https://opencode.ai/config.json",
"model": "llama/byteshape-q3km-3.31bpw",
"provider": {
"llama": {
"npm": "@ai-sdk/openai-compatible",
"name": "llama-server (local)",
"options": { "baseURL": "http://127.0.0.1:8080/v1" },
"models": {
"byteshape-q3km-3.31bpw": {
"name": "ByteShape Qwen3-Coder-30B Q3_K_M 3.31bpw"
}
}
}
}
}
{
"$schema": "https://opencode.ai/config.json",
"model": "llama/byteshape-q35-9b-iq4xs-4.20bpw",
"provider": {
"llama": {
"npm": "@ai-sdk/openai-compatible",
"name": "llama-server (local)",
"options": { "baseURL": "http://127.0.0.1:8080/v1" },
"models": {
"byteshape-q35-9b-iq4xs-4.20bpw": {
"name": "ByteShape Qwen3.5-9B IQ4_XS 4.20bpw"
}
}
}
}
}
To save this as your config file, run:
mkdir -p ~/.config/opencode
nano ~/.config/opencode/opencode.jsonc
This opens a terminal text editor called nano with an empty file. Paste the config above, then press Ctrl+O and Enter to save, followed by Ctrl+X to exit.
Configure OpenCode for LM Studio
OpenCode reads its config from ~/.config/opencode/opencode.jsonc.
With LM Studio the config includes a provider block pointing at the LM Studio server,
plus an agent section with sampling parameters tuned for coding tasks.
-
provider: tells OpenCode how to reach LM Studio. TheapiKeyis required by the SDK but LM Studio ignores it; use any dummy value. -
model: written aslmstudio/<identifier>. The identifier must match the--identifierflag you used when loading the model. -
agent: sets sampling parameters (temperature, top_p, etc.) for the build and plan modes. These values are tuned for instruction-following and code tasks.
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"lmstudio": {
"name": "LM Studio",
"npm": "@ai-sdk/openai-compatible",
"options": {
"baseURL": "http://127.0.0.1:1234/v1",
"apiKey": "dummy"
},
"models": {
"byteshape-qwen3-coder-30b": {
"name": "byteshape-qwen3-coder-30b"
}
}
}
},
"model": "lmstudio/byteshape-qwen3-coder-30b",
"agent": {
"build": {
"model": "lmstudio/byteshape-qwen3-coder-30b",
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"repeat_penalty": 1.05
},
"plan": {
"model": "lmstudio/byteshape-qwen3-coder-30b",
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"repeat_penalty": 1.05
}
}
}
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"lmstudio": {
"name": "LM Studio",
"npm": "@ai-sdk/openai-compatible",
"options": {
"baseURL": "http://127.0.0.1:1234/v1",
"apiKey": "dummy"
},
"models": {
"byteshape-qwen3.5-9b": {
"name": "byteshape-qwen3.5-9b"
}
}
}
},
"model": "lmstudio/byteshape-qwen3.5-9b",
"agent": {
"build": {
"model": "lmstudio/byteshape-qwen3.5-9b",
"temperature": 0.6,
"top_p": 0.95,
"top_k": 20,
"min_p": 0,
"presence_penalty": 0,
"repeat_penalty": 1
},
"plan": {
"model": "lmstudio/byteshape-qwen3.5-9b",
"temperature": 0.6,
"top_p": 0.95,
"top_k": 20,
"min_p": 0,
"presence_penalty": 0,
"repeat_penalty": 1
}
}
}
To save this as your config file, run:
mkdir -p ~/.config/opencode
nano ~/.config/opencode/opencode.jsonc
This opens a terminal text editor called nano with an empty file. Paste the config above, then press Ctrl+O and Enter to save, followed by Ctrl+X to exit.
lms unload, download a new one, and re-load with the same identifier.
Verify the config works
Start OpenCode from any directory:
opencode
Inside the TUI, press Ctrl+P to open the command menu, then choose
Switch model. Your models should appear by their display names.
Select one with Enter, type a short test message, and hit Enter.
If you get a reply, you're fully configured.
Troubleshooting
| Symptom | Likely cause |
|---|---|
| Model doesn't appear in the list | Typo in a provider or models key. Copy-paste is your friend |
| "Connection refused" error | Inference server isn't running on that port. Start it first |
| Response cuts off early | Context window on the server is too small, or the model ran out of VRAM. Increase -c / --context-length and try again |
| Ollama model not found | Model in config hasn't been pulled yet. Run ollama list to check |
LM Studio: lms command not found |
PATH not set. Run export PATH="$HOME/.lmstudio/bin:$PATH" or restart your shell |
| LM Studio: model identifier mismatch | The identifier in your config must exactly match the --identifier used in lms load. Run lms ps to check |
Part 4: Let's actually use it, build Flappy Bird in one prompt
Everything is installed, the server is running, OpenCode is configured. Time for the fun part: let's make a game.
We're going to ask OpenCode to build a Flappy Bird clone as a single HTML file (HTML + CSS + JavaScript, no build tools needed). You'll go from an empty folder to a playable game in about 60 seconds.
Step 1: Create a project folder and open OpenCode
mkdir ~/flappy-bird && cd ~/flappy-bird
opencode
OpenCode starts up and you'll see the TUI, a chat-style interface right in your terminal.
Step 2: Pick your model
Press Ctrl+P to open the main command menu,
then choose Switch model. Use the arrow keys (or start typing) to find your ByteShape model,
then hit Enter.
Step 3: Send the prompt
Type or paste the following into the chat input and hit Enter:
Build a Flappy Bird clone as a single index.html file. Use an HTML5 canvas
for rendering. Include:
- A bird that flaps on click or spacebar
- Scrolling pipes with a gap to fly through
- Gravity and simple collision detection
- A score counter that goes up for each pipe cleared
- A "Game Over" screen with a restart option
Make it colorful and fun. No external dependencies, everything in one file.
Now sit back and watch. OpenCode will write the file in real time, using tools under the hood,
creating index.html, writing the JavaScript, and so on. Just let it cook.
Step 4: Play your game
Once OpenCode finishes, open the file in a browser:
open index.html
xdg-open index.html
Your game lives inside WSL2, but your browser runs on the Windows side.
The easiest way to open it is with explorer.exe, which WSL can call directly:
explorer.exe index.html
This launches your default Windows browser with the file. If it doesn't work (some WSL setups need a full path), copy the file to your Windows desktop instead:
cp index.html /mnt/c/Users/$(cmd.exe /c "echo %USERNAME%" 2>/dev/null | tr -d '\r')/Desktop/
explorer.exe "C:\Users\$(cmd.exe /c "echo %USERNAME%" 2>/dev/null | tr -d '\r')\Desktop\index.html"
\\wsl$\Ubuntu, and press Enter.
Navigate to home › <your-username> › flappy-bird and double-click index.html.
Click the window (or press spacebar) and you're playing Flappy Bird. That whole thing, from empty folder to a working game, was one prompt.
Step 5: Iterate (the best part)
Don't like the colors? Want a night mode? Wish the pipes were shaped like cacti? Just tell OpenCode:
Make the background a night sky with stars. Change the bird to a little rocket
ship. Add a high score that persists across games using localStorage.
This is how agentic coding works: you describe what you want in plain language, and the agent makes the edits.
You stay in control: review the changes, /undo if you don't like something, and keep iterating
until it's perfect.
Cleanup: unloading LM Studio
When you're done coding, free the memory by unloading the model:
lms unload
This releases the model from memory but keeps LM Studio's daemon running. To fully shut down:
lms server stop
lms daemon off
Next time you want to use it, just start the daemon, load the model, and start the server again.
Further reading
You've got this. When something breaks, it's almost always URL, port, or model name. Check those three first and pour yourself something nice.
If you have any questions, feel free to reach out to us on Reddit or through our contact form.
The ByteShape team