OpenCode + ByteShape:
Run a Local Coding Agent on Your Machine

Published by Ali Hadi Zadeh • 2 April 2026 • Tutorial

Totally necessary demo

Yes, it built a Flappy Bird clone.

Look, it's not curing cancer. But once your local coding agent is running, it can tackle the actually useful stuff: refactoring, debugging, writing tests, building real features, all on your hardware, for free, forever. A silly game just happens to be a great one-prompt stress test.

Hover over the game and press Space, or just click it to start. Tap the game to start.

Hey! If you can open a terminal and follow steps, you're in the right place. This guide is for people who are new to LLMs, local inference, and all the jargon. We'll unpack it as we go.

Short version: ByteShape gives you the optimized model; LM Studio, llama.cpp, or Ollama runs it on your machine; and OpenCode provides the agentic coding interface, letting the model work through your instructions and interact directly with your code from the terminal. Three pieces, one workflow.

What on earth are we installing?

The architecture

ByteShape weights

optimized GGUF model files

LM Studio

install & serve

llama.cpp

build & serve

Ollama

pull & serve

OpenCode

coding agent · TUI

ByteShape ships optimized model weights (as GGUF files) tuned for speed and efficiency. You still need an inference engine to load those weights; that's LM Studio, llama.cpp, or Ollama. On top of the inference engine sits OpenCode, the coding agent that talks to your model over a local API.

What is OpenCode?

OpenCode is an open-source terminal UI (TUI) coding agent. It talks to your model over normal APIs, can use tools (edit files, run commands, etc.), and reads its config from ~/.config/opencode/opencode.jsonc. It's the pilot, not the engine. The engine is LM Studio, llama.cpp, or Ollama.

LM Studio vs llama.cpp vs Ollama — pick your vibe

All three inference engines load the same GGUF weights and expose a local HTTP API for OpenCode to talk to. The difference is in how much setup you want and how much control you need.

	LM Studio	llama.cpp	Ollama
Vibe	"Easy to use and runs everything"	"I want every knob and dial"	"I want it to Just Work"
Setup	Install, browse & download model, load	Build or use releases, tune flags	Install, pull model, go
Best for	Broad model compatibility with a friendly CLI	KV cache tweaks, layers, server defaults	Getting started fast (but not all models are supported)

LM Studio: Desktop app with a CLI (lms). Browse and download models interactively, load them with a fixed identifier for consistent API usage, and start an OpenAI-compatible server. Works on Mac, Linux, and Windows (WSL2).
llama.cpp: Lower-level, super capable. Its llama-server component speaks an OpenAI-style API and exposes CLI options for context size, batching, KV cache quantization, GPU layers, and more. If you love reading --help, you'll be happy here.
Ollama: Downloads models, keeps things tidy, exposes a local HTTP API. Minimal setup, but does not support the Qwen3.5 family yet, so it is only available as an inference engine for Coder 30B in this tutorial.

KV cache in one line: While generating text, the model remembers past tokens in a cache. Tuning how that cache is stored can save VRAM. llama.cpp tends to expose the most options here.

Configure your setup

Model

Engine

Hardware

Showing instructions for: make a selection above

TL;DR

Just give me the commands

Don't feel like reading the full step-by-step? Here's the whole thing condensed into a single copy-paste block. If anything breaks, scroll down to the detailed walkthrough for more context and troubleshooting help.

# Install Ollama
brew install ollama

# Start Ollama with 32K context
OLLAMA_CONTEXT_LENGTH=32768 ollama serve

# (in a new terminal) Pull the model
ollama pull hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf

# Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.zshrc

# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
  "$schema": "https://opencode.ai/config.json",
  "model": "ollama/hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf",
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama (local)",
      "options": { "baseURL": "http://localhost:11434/v1" },
      "models": {
        "hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf": {
          "name": "ByteShape Qwen3-Coder-30B IQ4_XS 4.20bpw"
        }
      }
    }
  }
}
EOF

# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama with 32K context
OLLAMA_CONTEXT_LENGTH=32768 ollama serve

# (in a new terminal) Pull the model
ollama pull hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf

# Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.bashrc

# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
  "$schema": "https://opencode.ai/config.json",
  "model": "ollama/hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf",
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama (local)",
      "options": { "baseURL": "http://localhost:11434/v1" },
      "models": {
        "hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf": {
          "name": "ByteShape Qwen3-Coder-30B IQ4_XS 4.20bpw"
        }
      }
    }
  }
}
EOF

# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama with 32K context
OLLAMA_CONTEXT_LENGTH=32768 ollama serve

# (in a new terminal) Pull the model
ollama pull hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-Q3_K_M-3.31bpw.gguf

# Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.bashrc

# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
  "$schema": "https://opencode.ai/config.json",
  "model": "ollama/hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-Q3_K_M-3.31bpw.gguf",
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama (local)",
      "options": { "baseURL": "http://localhost:11434/v1" },
      "models": {
        "hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-Q3_K_M-3.31bpw.gguf": {
          "name": "ByteShape Qwen3-Coder-30B Q3_K_M 3.31bpw"
        }
      }
    }
  }
}
EOF

# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode

# Install build tools
xcode-select --install
brew install cmake

# Build llama.cpp
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build --config Release -j

# Start the server (downloads model on first run)
./build/bin/llama-server \
  -hf byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf \
  --host 127.0.0.1 --port 8080 \
  -ngl 99 -c 32768 \
  --repeat-penalty 1.05 --temp 0.7 --top-k 20 --top-p 0.8

# (in a new terminal) Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.zshrc

# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
  "$schema": "https://opencode.ai/config.json",
  "model": "llama/byteshape-iq4xs-4.20bpw",
  "provider": {
    "llama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "llama-server (local)",
      "options": { "baseURL": "http://127.0.0.1:8080/v1" },
      "models": {
        "byteshape-iq4xs-4.20bpw": {
          "name": "ByteShape Qwen3-Coder-30B IQ4_XS 4.20bpw"
        }
      }
    }
  }
}
EOF

# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode

# Install build tools
xcode-select --install
brew install cmake

# Build llama.cpp
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build --config Release -j

# Start the server (downloads model on first run)
./build/bin/llama-server \
  -hf byteshape/Qwen3.5-9B-GGUF:Qwen3.5-9B-IQ4_XS-4.43bpw.gguf \
  --host 127.0.0.1 --port 8080 \
  -ngl 99 -c 32768 \
  --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 0 --repeat-penalty 1

# (in a new terminal) Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.zshrc

# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
  "$schema": "https://opencode.ai/config.json",
  "model": "llama/byteshape-q35-9b-iq4xs-4.43bpw",
  "provider": {
    "llama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "llama-server (local)",
      "options": { "baseURL": "http://127.0.0.1:8080/v1" },
      "models": {
        "byteshape-q35-9b-iq4xs-4.43bpw": {
          "name": "ByteShape Qwen3.5-9B IQ4_XS 4.43bpw"
        }
      }
    }
  }
}
EOF

# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode

# Install build tools
sudo apt update
sudo apt install -y build-essential git cmake

# Build llama.cpp with CUDA
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j

# Start the server (downloads model on first run)
./build/bin/llama-server \
  -hf byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf \
  --host 127.0.0.1 --port 8080 \
  -ngl 99 -c 32768 \
  --repeat-penalty 1.05 --temp 0.7 --top-k 20 --top-p 0.8

# (in a new terminal) Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.bashrc

# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
  "$schema": "https://opencode.ai/config.json",
  "model": "llama/byteshape-iq4xs-4.20bpw",
  "provider": {
    "llama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "llama-server (local)",
      "options": { "baseURL": "http://127.0.0.1:8080/v1" },
      "models": {
        "byteshape-iq4xs-4.20bpw": {
          "name": "ByteShape Qwen3-Coder-30B IQ4_XS 4.20bpw"
        }
      }
    }
  }
}
EOF

# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode

# Install build tools
sudo apt update
sudo apt install -y build-essential git cmake

# Build llama.cpp with CUDA
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j

# Start the server (downloads model on first run)
./build/bin/llama-server \
  -hf byteshape/Qwen3.5-9B-GGUF:Qwen3.5-9B-IQ4_XS-4.43bpw.gguf \
  --host 127.0.0.1 --port 8080 \
  -ngl 99 -c 32768 \
  --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 0 --repeat-penalty 1

# (in a new terminal) Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.bashrc

# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
  "$schema": "https://opencode.ai/config.json",
  "model": "llama/byteshape-q35-9b-iq4xs-4.43bpw",
  "provider": {
    "llama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "llama-server (local)",
      "options": { "baseURL": "http://127.0.0.1:8080/v1" },
      "models": {
        "byteshape-q35-9b-iq4xs-4.43bpw": {
          "name": "ByteShape Qwen3.5-9B IQ4_XS 4.43bpw"
        }
      }
    }
  }
}
EOF

# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode

# Install build tools
sudo apt update
sudo apt install -y build-essential git cmake

# Build llama.cpp (CPU)
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build --config Release -j

# Start the server (downloads model on first run)
./build/bin/llama-server \
  -hf byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-Q3_K_M-3.31bpw.gguf \
  --host 127.0.0.1 --port 8080 \
  -c 32768 \
  --repeat-penalty 1.05 --temp 0.7 --top-k 20 --top-p 0.8

# (in a new terminal) Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.bashrc

# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
  "$schema": "https://opencode.ai/config.json",
  "model": "llama/byteshape-q3km-3.31bpw",
  "provider": {
    "llama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "llama-server (local)",
      "options": { "baseURL": "http://127.0.0.1:8080/v1" },
      "models": {
        "byteshape-q3km-3.31bpw": {
          "name": "ByteShape Qwen3-Coder-30B Q3_K_M 3.31bpw"
        }
      }
    }
  }
}
EOF

# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode

# Install build tools
sudo apt update
sudo apt install -y build-essential git cmake

# Build llama.cpp (CPU)
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build --config Release -j

# Start the server (downloads model on first run)
./build/bin/llama-server \
  -hf byteshape/Qwen3.5-9B-GGUF:Qwen3.5-9B-IQ4_XS-4.20bpw.gguf \
  --host 127.0.0.1 --port 8080 \
  -c 32768 \
  --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 0 --repeat-penalty 1

# (in a new terminal) Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.bashrc

# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
  "$schema": "https://opencode.ai/config.json",
  "model": "llama/byteshape-q35-9b-iq4xs-4.20bpw",
  "provider": {
    "llama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "llama-server (local)",
      "options": { "baseURL": "http://127.0.0.1:8080/v1" },
      "models": {
        "byteshape-q35-9b-iq4xs-4.20bpw": {
          "name": "ByteShape Qwen3.5-9B IQ4_XS 4.20bpw"
        }
      }
    }
  }
}
EOF

# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode

# Install LM Studio CLI
curl -fsSL https://lmstudio.ai/install.sh | bash
export PATH="$HOME/.lmstudio/bin:$PATH"

# Start daemon and download model (interactive selection)
lms daemon up
lms get byteshape
# → Select: byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF
# → Pick your preferred quant

# Load with a fixed identifier (Metal GPU auto-detected on Mac)
lms load qwen3-coder-30b --context-length 32768 --identifier "byteshape-qwen3-coder-30b" --gpu max

# Start the API server
lms server start

# Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.zshrc

# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "lmstudio": {
      "name": "LM Studio",
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://127.0.0.1:1234/v1",
        "apiKey": "dummy"
      },
      "models": {
        "byteshape-qwen3-coder-30b": {
          "name": "byteshape-qwen3-coder-30b"
        }
      }
    }
  },
  "model": "lmstudio/byteshape-qwen3-coder-30b",
  "agent": {
    "build": {
      "model": "lmstudio/byteshape-qwen3-coder-30b",
      "temperature": 0.7,
      "top_p": 0.8,
      "top_k": 20,
      "repeat_penalty": 1.05
    },
    "plan": {
      "model": "lmstudio/byteshape-qwen3-coder-30b",
      "temperature": 0.7,
      "top_p": 0.8,
      "top_k": 20,
      "repeat_penalty": 1.05
    }
  }
}
EOF

# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode

# Install LM Studio CLI
curl -fsSL https://lmstudio.ai/install.sh | bash
export PATH="$HOME/.lmstudio/bin:$PATH"

# Start daemon and download model (interactive selection)
lms daemon up
lms get byteshape
# → Select: byteshape/Qwen3.5-9B-GGUF
# → Pick your preferred quant

# Load with a fixed identifier (Metal GPU auto-detected on Mac)
lms load qwen3.5-9b --context-length 32768 --identifier "byteshape-qwen3.5-9b" --gpu max

# Start the API server
lms server start

# Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.zshrc

# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "lmstudio": {
      "name": "LM Studio",
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://127.0.0.1:1234/v1",
        "apiKey": "dummy"
      },
      "models": {
        "byteshape-qwen3.5-9b": {
          "name": "byteshape-qwen3.5-9b"
        }
      }
    }
  },
  "model": "lmstudio/byteshape-qwen3.5-9b",
  "agent": {
    "build": {
      "model": "lmstudio/byteshape-qwen3.5-9b",
      "temperature": 0.6,
      "top_p": 0.95,
      "top_k": 20,
      "min_p": 0,
      "presence_penalty": 0,
      "repeat_penalty": 1
    },
    "plan": {
      "model": "lmstudio/byteshape-qwen3.5-9b",
      "temperature": 0.6,
      "top_p": 0.95,
      "top_k": 20,
      "min_p": 0,
      "presence_penalty": 0,
      "repeat_penalty": 1
    }
  }
}
EOF

# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode

# Install dependencies
sudo apt-get update && sudo apt-get install -y libatomic1 libgomp1 curl

# Install LM Studio CLI
curl -fsSL https://lmstudio.ai/install.sh | bash
export PATH="$HOME/.lmstudio/bin:$PATH"

# Start daemon and download model (interactive selection)
lms daemon up
lms get byteshape
# → Select: byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF
# → Pick your preferred quant

# Load with GPU offload and a fixed identifier
lms load qwen3-coder-30b --context-length 32768 --identifier "byteshape-qwen3-coder-30b" --gpu max

# Start the API server
lms server start

# Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.bashrc

# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "lmstudio": {
      "name": "LM Studio",
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://127.0.0.1:1234/v1",
        "apiKey": "dummy"
      },
      "models": {
        "byteshape-qwen3-coder-30b": {
          "name": "byteshape-qwen3-coder-30b"
        }
      }
    }
  },
  "model": "lmstudio/byteshape-qwen3-coder-30b",
  "agent": {
    "build": {
      "model": "lmstudio/byteshape-qwen3-coder-30b",
      "temperature": 0.7,
      "top_p": 0.8,
      "top_k": 20,
      "repeat_penalty": 1.05
    },
    "plan": {
      "model": "lmstudio/byteshape-qwen3-coder-30b",
      "temperature": 0.7,
      "top_p": 0.8,
      "top_k": 20,
      "repeat_penalty": 1.05
    }
  }
}
EOF

# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode

# Install dependencies
sudo apt-get update && sudo apt-get install -y libatomic1 libgomp1 curl

# Install LM Studio CLI
curl -fsSL https://lmstudio.ai/install.sh | bash
export PATH="$HOME/.lmstudio/bin:$PATH"

# Start daemon and download model (interactive selection)
lms daemon up
lms get byteshape
# → Select: byteshape/Qwen3.5-9B-GGUF
# → Pick your preferred quant

# Load with GPU offload and a fixed identifier
lms load qwen3.5-9b --context-length 32768 --identifier "byteshape-qwen3.5-9b" --gpu max

# Start the API server
lms server start

# Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.bashrc

# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "lmstudio": {
      "name": "LM Studio",
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://127.0.0.1:1234/v1",
        "apiKey": "dummy"
      },
      "models": {
        "byteshape-qwen3.5-9b": {
          "name": "byteshape-qwen3.5-9b"
        }
      }
    }
  },
  "model": "lmstudio/byteshape-qwen3.5-9b",
  "agent": {
    "build": {
      "model": "lmstudio/byteshape-qwen3.5-9b",
      "temperature": 0.6,
      "top_p": 0.95,
      "top_k": 20,
      "min_p": 0,
      "presence_penalty": 0,
      "repeat_penalty": 1
    },
    "plan": {
      "model": "lmstudio/byteshape-qwen3.5-9b",
      "temperature": 0.6,
      "top_p": 0.95,
      "top_k": 20,
      "min_p": 0,
      "presence_penalty": 0,
      "repeat_penalty": 1
    }
  }
}
EOF

# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode

# Install dependencies
sudo apt-get update && sudo apt-get install -y libatomic1 libgomp1 curl

# Install LM Studio CLI
curl -fsSL https://lmstudio.ai/install.sh | bash
export PATH="$HOME/.lmstudio/bin:$PATH"

# Start daemon and download model (interactive selection)
lms daemon up
lms get byteshape
# → Select: byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF
# → Pick your preferred quant

# Load (CPU, no --gpu flag)
lms load qwen3-coder-30b --context-length 32768 --identifier "byteshape-qwen3-coder-30b"

# Start the API server
lms server start

# Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.bashrc

# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "lmstudio": {
      "name": "LM Studio",
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://127.0.0.1:1234/v1",
        "apiKey": "dummy"
      },
      "models": {
        "byteshape-qwen3-coder-30b": {
          "name": "byteshape-qwen3-coder-30b"
        }
      }
    }
  },
  "model": "lmstudio/byteshape-qwen3-coder-30b",
  "agent": {
    "build": {
      "model": "lmstudio/byteshape-qwen3-coder-30b",
      "temperature": 0.7,
      "top_p": 0.8,
      "top_k": 20,
      "repeat_penalty": 1.05
    },
    "plan": {
      "model": "lmstudio/byteshape-qwen3-coder-30b",
      "temperature": 0.7,
      "top_p": 0.8,
      "top_k": 20,
      "repeat_penalty": 1.05
    }
  }
}
EOF

# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode

# Install dependencies
sudo apt-get update && sudo apt-get install -y libatomic1 libgomp1 curl

# Install LM Studio CLI
curl -fsSL https://lmstudio.ai/install.sh | bash
export PATH="$HOME/.lmstudio/bin:$PATH"

# Start daemon and download model (interactive selection)
lms daemon up
lms get byteshape
# → Select: byteshape/Qwen3.5-9B-GGUF
# → Pick your preferred quant

# Load (CPU, no --gpu flag)
lms load qwen3.5-9b --context-length 32768 --identifier "byteshape-qwen3.5-9b"

# Start the API server
lms server start

# Install OpenCode
curl -fsSL https://opencode.ai/install | bash
source ~/.bashrc

# Write the config
mkdir -p ~/.config/opencode
cat <<'EOF' > ~/.config/opencode/opencode.jsonc
{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "lmstudio": {
      "name": "LM Studio",
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://127.0.0.1:1234/v1",
        "apiKey": "dummy"
      },
      "models": {
        "byteshape-qwen3.5-9b": {
          "name": "byteshape-qwen3.5-9b"
        }
      }
    }
  },
  "model": "lmstudio/byteshape-qwen3.5-9b",
  "agent": {
    "build": {
      "model": "lmstudio/byteshape-qwen3.5-9b",
      "temperature": 0.6,
      "top_p": 0.95,
      "top_k": 20,
      "min_p": 0,
      "presence_penalty": 0,
      "repeat_penalty": 1
    },
    "plan": {
      "model": "lmstudio/byteshape-qwen3.5-9b",
      "temperature": 0.6,
      "top_p": 0.95,
      "top_k": 20,
      "min_p": 0,
      "presence_penalty": 0,
      "repeat_penalty": 1
    }
  }
}
EOF

# Launch: from an empty project folder
mkdir -p ~/my-project && cd ~/my-project
opencode

Nothing happening after you hit Enter? On a CPU, the model first has to process OpenCode’s large system prompt, which is about 10,000 tokens, before it can respond. That initial pass can take a few minutes, so give it a bit of time. Once it is done and cached, the model becomes much more responsive.

Part 1: Set up your environment

First: Set up WSL2

For this tutorial, everything on Windows happens inside WSL2 (Windows Subsystem for Linux). LM Studio, llama.cpp, Ollama, and OpenCode all run inside an Ubuntu terminal, so once WSL2 is set up, the Linux instructions apply to you too.

Step 1

Open PowerShell as Administrator and run:

wsl --install

This installs WSL2 with Ubuntu by default. If you already have WSL but it's version 1, upgrade it:

wsl --set-default-version 2

Step 2

Restart your computer when prompted.

Step 3

After reboot, Ubuntu will launch and ask you to create a username and password. This is your Linux user inside WSL, totally separate from your Windows login.

Step 4

Once you're at the Ubuntu prompt, update packages:

sudo apt update && sudo apt upgrade -y

You now have a real Linux terminal. Everything below runs right here.

Install Nvidia drivers on Linux

CPU-only works, but big models get snail-like without GPU acceleration. Install a recent proprietary driver:

Open Software & Updates → Additional Drivers → pick the recommended NVIDIA driver → click Apply Changes → reboot.

Then verify in a terminal:

nvidia-smi

If you see your GPU and driver version listed, you're set.

Tip: For llama.cpp you'll also need the CUDA Toolkit (covered in the build steps below). For LM Studio and Ollama, the driver alone is usually enough; they handle GPU detection automatically.

Nvidia GPU passthrough for WSL2

The NVIDIA driver is installed on the Windows side, not inside WSL. WSL2 picks it up automatically once it's there.

Download from nvidia.com/Download, or use GeForce Experience / Studio Driver channel.
Reboot if the installer asks.
After reboot, open your WSL2 Ubuntu terminal and run nvidia-smi. If you see your GPU listed, GPU passthrough is working.

Make sure you're on Windows 10 21H2+ or Windows 11, and that you installed the Windows GPU driver and not a Linux .run driver inside WSL, which will break things.

Install Ollama on macOS

Choose either option:

# Option A: Homebrew
brew install ollama

# Option B: Download the app
# Visit https://ollama.com/download and run the installer

After installing, Ollama runs as a menu bar app. We'll start it properly with the right context length in Part 2.

Install Ollama on Linux

curl -fsSL https://ollama.com/install.sh | sh

That's the install done. We'll start the server with the right context length in Part 2.

Install Ollama inside WSL2

Inside your Ubuntu WSL terminal, run the Linux install script:

curl -fsSL https://ollama.com/install.sh | sh

That's the install done. We'll start the server with the right context length in Part 2.

llama.cpp is the open-source inference engine we'll build from source. It ships a component called llama-server that exposes an OpenAI-compatible HTTP API; that's the piece OpenCode talks to. In the steps below we build the whole project, then use llama-server to load and serve the model.

Build llama.cpp on macOS

Step A: Install build tools

Install Xcode Command Line Tools (gives you git, clang, make):

xcode-select --install

Then install CMake via Homebrew (easiest):

brew install cmake

No Homebrew? Download a macOS binary from cmake.org/download and put cmake on your PATH.

Build llama.cpp

Step A: Install build tools

sudo apt update
sudo apt install -y build-essential git cmake

Step B: Get the source

git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp

Step C: Build

On macOS, Metal (Apple GPU acceleration) is enabled by default, so the same CMake commands below give you GPU offload without extra flags.

cmake -B build
cmake --build build --config Release -j

Check the binary:

./build/bin/llama-server --help

A wall of flags = you're golden.

Step C: Build (CPU)

cmake -B build
cmake --build build --config Release -j

The -j flag builds in parallel (much faster on multi-core machines). Check the binary:

./build/bin/llama-server --help

If you see a wall of flags and options, you're all set.

Step C: Install CUDA Toolkit

You need the CUDA Toolkit installed so nvcc is available. On WSL2, install it inside the Ubuntu environment (not on the Windows side), and when downloading from Nvidia's website, make sure you select Linux as the operating system, not Windows.

After the installer finishes, nvcc must be on your PATH for the build to find it:

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Confirm it works:

nvcc --version

Verify nvidia-smi shows your GPU before continuing, then:

Step D: Build with CUDA

Delete any old build directory first, then:

cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j

If CMake can't detect your GPU architecture, see the CMAKE_CUDA_ARCHITECTURES section in docs/build.md. Check the binary:

./build/bin/llama-server --help

Install LM Studio on macOS

Step A: Install LM Studio

curl -fsSL https://lmstudio.ai/install.sh | bash

Restart your shell or add the CLI to your current session:

export PATH="$HOME/.lmstudio/bin:$PATH"

Verify:

lms --version

Install LM Studio on Linux

Step A: Install dependencies

LM Studio needs a few system libraries. Install them first:

sudo apt-get update && sudo apt-get install -y libatomic1 libgomp1 curl

Tip: These are usually already present. If lms complains about missing libraries later, come back and install them.

Step B: Install LM Studio

curl -fsSL https://lmstudio.ai/install.sh | bash

Restart your shell or add the CLI to your current session:

export PATH="$HOME/.lmstudio/bin:$PATH"

Verify:

lms --version

Install LM Studio inside WSL2

Step A: Install dependencies

Inside your Ubuntu WSL terminal:

sudo apt-get update && sudo apt-get install -y libatomic1 libgomp1 curl

Step B: Install LM Studio

curl -fsSL https://lmstudio.ai/install.sh | bash

Restart your shell or add the CLI to your current session:

export PATH="$HOME/.lmstudio/bin:$PATH"

Verify:

lms --version

Part 2: Download and run your ByteShape model

Which model file?

Picking the right quantization for your setup is something of an art. Every combination of hardware, inference engine, and data type behaves differently, and the "best" quant is the one that gives you the highest quality at a speed you can actually work with.

One of ByteShape's contributions is that we rigorously evaluate the quality and performance of every single quantization we publish, and provide interactive graphs so you can find the sweet spot for your exact hardware.

Head over to our Qwen3-Coder-30B analysis to explore the full results. The general rule: pick a hardware chart that matches your system, then choose the highest-quality quant that still runs at a comfortable speed.

For this tutorial we've picked an example model to keep things concrete:

Setup	Model file	Size
GPU (Nvidia / Apple Silicon)	`Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf`	16 GB
CPU-only	`Qwen3-Coder-30B-A3B-Instruct-Q3_K_M-3.31bpw.gguf`	12.7 GB

You should be able to drop in any other quant from the ByteShape repo and follow the same instructions; just swap the filename.

Head over to our Qwen3.5-9B analysis to explore the full results. The general rule: pick a hardware chart that matches your system, then choose the highest-quality quant that still runs at a comfortable speed.

For this tutorial we've picked an example model to keep things concrete:

Setup	Model file	Size
GPU (Nvidia / Apple Silicon)	`Qwen3.5-9B-IQ4_XS-4.43bpw.gguf`	4.97 GB
CPU-only	`Qwen3.5-9B-IQ4_XS-4.20bpw.gguf`	4.71 GB

You should be able to drop in any other quant from the ByteShape repo and follow the same instructions; just swap the filename.

Use the full filename. The instructions below use the full string including the model base name and bpw. LM Studio, llama.cpp, and Ollama all need this exact format to resolve the file.

Pull and run with Ollama

Step 1: Start Ollama

What is context length? It's how much text the model can "see" at once: your prompt, the conversation history, open files, and tool results all count against this budget. Coding agents are heavy consumers: OpenCode's system prompt alone is roughly 10K tokens, so that budget is spoken for before you even type your first message. 32K is a reasonable starting point, but you may need more for serious work on larger repositories. Adjust based on your available memory and workload.

The easiest way on Mac is the Ollama menu bar app. Open its Settings and drag the context length slider to 32K (or higher if your system can handle it). If you prefer the terminal, run:

OLLAMA_CONTEXT_LENGTH=32768 ollama serve

Either way sets the per-request ceiling for every model Ollama serves. See the Ollama context length docs for more details.

Start Ollama in a dedicated terminal and keep it running in the background:

OLLAMA_CONTEXT_LENGTH=32768 ollama serve

Step 2: Pull the model

Use the hf.co/ prefix so Ollama fetches directly from Hugging Face:

ollama pull hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf

ollama pull hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-Q3_K_M-3.31bpw.gguf

Step 3: Verify

ollama list

The model name appears exactly as you typed after hf.co/. Keep that string handy for the config.

Start llama-server (GPU)

llama-server downloads the model on first run, no manual download needed. The -hf argument follows the form owner/repo:filename.

./build/bin/llama-server \
  -hf byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf \
  --host 127.0.0.1 --port 8080 \
  -ngl 99 -c 32768 \
  --repeat-penalty 1.05 --temp 0.7 --top-k 20 --top-p 0.8

./build/bin/llama-server \
  -hf byteshape/Qwen3.5-9B-GGUF:Qwen3.5-9B-IQ4_XS-4.43bpw.gguf \
  --host 127.0.0.1 --port 8080 \
  -ngl 99 -c 32768 \
  --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 0 --repeat-penalty 1

-ngl 99 offloads all layers to the GPU. Lower this number if you run out of VRAM.
-c 32768 is the context window. This controls how much text the model can hold in view at once: prompt, history, open files, and tool results all count. OpenCode's system prompt alone is ~10K tokens, so 32K is a reasonable starting point, but for larger repositories you'll want more. If you have spare VRAM, bump this to 64K or higher.
Sampling flags (--repeat-penalty, --temp, --top-k, --top-p) shape how the model picks its next token. These values are tuned for instruction-following and code tasks. Leave them as-is unless you have a specific reason to change them.
First run downloads the model to a local cache. Subsequent starts are instant.
One model per server process. To switch, stop the server and restart with a different -hf value.

Start llama-server (CPU)

llama-server downloads the model on first run, no manual download needed. The -hf argument follows the form owner/repo:filename.

./build/bin/llama-server \
  -hf byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-Q3_K_M-3.31bpw.gguf \
  --host 127.0.0.1 --port 8080 \
  -c 32768 \
  --repeat-penalty 1.05 --temp 0.7 --top-k 20 --top-p 0.8

./build/bin/llama-server \
  -hf byteshape/Qwen3.5-9B-GGUF:Qwen3.5-9B-IQ4_XS-4.20bpw.gguf \
  --host 127.0.0.1 --port 8080 \
  -c 32768 \
  --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 0 --repeat-penalty 1

-c 32768 is the context window. This controls how much text the model can hold in view at once: prompt, history, open files, and tool results all count. OpenCode's system prompt alone is ~10K tokens, so 32K is a reasonable starting point, but for larger repositories you'll want more.
Sampling flags (--repeat-penalty, --temp, --top-k, --top-p) shape how the model picks its next token. These values are tuned for instruction-following and code tasks. Leave them as-is unless you have a specific reason to change them.
First run downloads the model to a local cache. Subsequent starts are instant.
One model per server process. To switch, stop the server and restart with a different -hf value.

Download and load with LM Studio

Step 1: Start the LM Studio daemon

lms daemon up

Step 2: Download a ByteShape model

The lms get command lets you browse and download models interactively. Run it with the ByteShape publisher name:

lms get byteshape

This shows all available ByteShape models. Use the arrow keys to select your model, then pick a quantization.

Select byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF, then choose a quant that fits your hardware.

Select byteshape/Qwen3.5-9B-GGUF, then choose a quant that fits your hardware.

Step 3: Load the model

After download, load the model with a fixed identifier. The identifier gives the model a stable API name so your OpenCode config stays the same even if you switch quants later.

Important: If you downloaded multiple variants of the same model, the load command may need a tag to disambiguate. If only one variant was downloaded, the short name works. Check lms ls to see what's available.

lms load qwen3-coder-30b --context-length 32768 --identifier "byteshape-qwen3-coder-30b" --gpu max

lms load qwen3.5-9b --context-length 32768 --identifier "byteshape-qwen3.5-9b" --gpu max

lms load qwen3-coder-30b --context-length 32768 --identifier "byteshape-qwen3-coder-30b"

lms load qwen3.5-9b --context-length 32768 --identifier "byteshape-qwen3.5-9b"

--context-length 32768 sets the context window. 32K is a starting point; increase it if your system has the memory and your projects are large.
--identifier gives the model a stable name for the API. Your OpenCode config references this identifier.
--gpu max offloads all layers to the GPU. Omit this flag for CPU-only inference.

Step 4: Verify the model is loaded

lms ps

You should see your model listed with its identifier.

Step 5: Start the API server

lms server start

LM Studio now serves an OpenAI-compatible API on http://127.0.0.1:1234/v1. Keep this running while you use OpenCode.

Part 3: Install OpenCode and configure your model

Install OpenCode

curl -fsSL https://opencode.ai/install | bash

Restart your terminal (or run source ~/.bashrc / source ~/.zshrc) so the opencode command is on your PATH. Quick check:

opencode --version

If a version number prints, you're good.

Configure OpenCode for Ollama

OpenCode reads its config from ~/.config/opencode/opencode.jsonc. The config has two main pieces:

provider: tells OpenCode how to reach your inference server and what models it offers. You give it a short label (provider_id).
model: the default model, written as provider_id/model_id. Both parts must match what you defined in provider.

{
  "$schema": "https://opencode.ai/config.json",
  "model": "ollama/hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf",

  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama (local)",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-IQ4_XS-4.20bpw.gguf": {
          "name": "ByteShape Qwen3-Coder-30B IQ4_XS 4.20bpw"
        }
      }
    }
  }
}

{
  "$schema": "https://opencode.ai/config.json",
  "model": "ollama/hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-Q3_K_M-3.31bpw.gguf",

  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama (local)",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "hf.co/byteshape/Qwen3-Coder-30B-A3B-Instruct-GGUF:Qwen3-Coder-30B-A3B-Instruct-Q3_K_M-3.31bpw.gguf": {
          "name": "ByteShape Qwen3-Coder-30B Q3_K_M 3.31bpw"
        }
      }
    }
  }
}

To save this as your config file, run:

mkdir -p ~/.config/opencode
nano ~/.config/opencode/opencode.jsonc

This opens a terminal text editor called nano with an empty file. Paste the config above, then press Ctrl+O and Enter to save, followed by Ctrl+X to exit.

Different hardware? You can swap in any GGUF variant that fits your available DRAM / VRAM. Check our blog for a full analysis of every quant.

Configure OpenCode for llama.cpp

OpenCode reads its config from ~/.config/opencode/opencode.jsonc. The config has two main pieces:

provider: tells OpenCode how to reach your inference server and what models it offers. You give it a short label (provider_id).
model: the default model, written as provider_id/model_id. Both parts must match what you defined in provider.

{
  "$schema": "https://opencode.ai/config.json",
  "model": "llama/byteshape-iq4xs-4.20bpw",

  "provider": {
    "llama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "llama-server (local)",
      "options": { "baseURL": "http://127.0.0.1:8080/v1" },
      "models": {
        "byteshape-iq4xs-4.20bpw": {
          "name": "ByteShape Qwen3-Coder-30B IQ4_XS 4.20bpw"
        }
      }
    }
  }
}

{
  "$schema": "https://opencode.ai/config.json",
  "model": "llama/byteshape-q35-9b-iq4xs-4.43bpw",

  "provider": {
    "llama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "llama-server (local)",
      "options": { "baseURL": "http://127.0.0.1:8080/v1" },
      "models": {
        "byteshape-q35-9b-iq4xs-4.43bpw": {
          "name": "ByteShape Qwen3.5-9B IQ4_XS 4.43bpw"
        }
      }
    }
  }
}

{
  "$schema": "https://opencode.ai/config.json",
  "model": "llama/byteshape-q3km-3.31bpw",

  "provider": {
    "llama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "llama-server (local)",
      "options": { "baseURL": "http://127.0.0.1:8080/v1" },
      "models": {
        "byteshape-q3km-3.31bpw": {
          "name": "ByteShape Qwen3-Coder-30B Q3_K_M 3.31bpw"
        }
      }
    }
  }
}

{
  "$schema": "https://opencode.ai/config.json",
  "model": "llama/byteshape-q35-9b-iq4xs-4.20bpw",

  "provider": {
    "llama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "llama-server (local)",
      "options": { "baseURL": "http://127.0.0.1:8080/v1" },
      "models": {
        "byteshape-q35-9b-iq4xs-4.20bpw": {
          "name": "ByteShape Qwen3.5-9B IQ4_XS 4.20bpw"
        }
      }
    }
  }
}

To save this as your config file, run:

mkdir -p ~/.config/opencode
nano ~/.config/opencode/opencode.jsonc

This opens a terminal text editor called nano with an empty file. Paste the config above, then press Ctrl+O and Enter to save, followed by Ctrl+X to exit.

Different hardware? You can swap in any GGUF variant that fits your available DRAM / VRAM. Check our blog for a full analysis of every quant.

Configure OpenCode for LM Studio

OpenCode reads its config from ~/.config/opencode/opencode.jsonc. With LM Studio the config includes a provider block pointing at the LM Studio server, plus an agent section with sampling parameters tuned for coding tasks.

provider: tells OpenCode how to reach LM Studio. The apiKey is required by the SDK but LM Studio ignores it; use any dummy value.
model: written as lmstudio/<identifier>. The identifier must match the --identifier flag you used when loading the model.
agent: sets sampling parameters (temperature, top_p, etc.) for the build and plan modes. These values are tuned for instruction-following and code tasks.

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "lmstudio": {
      "name": "LM Studio",
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://127.0.0.1:1234/v1",
        "apiKey": "dummy"
      },
      "models": {
        "byteshape-qwen3-coder-30b": {
          "name": "byteshape-qwen3-coder-30b"
        }
      }
    }
  },
  "model": "lmstudio/byteshape-qwen3-coder-30b",
  "agent": {
    "build": {
      "model": "lmstudio/byteshape-qwen3-coder-30b",
      "temperature": 0.7,
      "top_p": 0.8,
      "top_k": 20,
      "repeat_penalty": 1.05
    },
    "plan": {
      "model": "lmstudio/byteshape-qwen3-coder-30b",
      "temperature": 0.7,
      "top_p": 0.8,
      "top_k": 20,
      "repeat_penalty": 1.05
    }
  }
}

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "lmstudio": {
      "name": "LM Studio",
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://127.0.0.1:1234/v1",
        "apiKey": "dummy"
      },
      "models": {
        "byteshape-qwen3.5-9b": {
          "name": "byteshape-qwen3.5-9b"
        }
      }
    }
  },
  "model": "lmstudio/byteshape-qwen3.5-9b",
  "agent": {
    "build": {
      "model": "lmstudio/byteshape-qwen3.5-9b",
      "temperature": 0.6,
      "top_p": 0.95,
      "top_k": 20,
      "min_p": 0,
      "presence_penalty": 0,
      "repeat_penalty": 1
    },
    "plan": {
      "model": "lmstudio/byteshape-qwen3.5-9b",
      "temperature": 0.6,
      "top_p": 0.95,
      "top_k": 20,
      "min_p": 0,
      "presence_penalty": 0,
      "repeat_penalty": 1
    }
  }
}

To save this as your config file, run:

mkdir -p ~/.config/opencode
nano ~/.config/opencode/opencode.jsonc

This opens a terminal text editor called nano with an empty file. Paste the config above, then press Ctrl+O and Enter to save, followed by Ctrl+X to exit.

Different quant? No config change needed. The identifier stays the same regardless of which quantization you downloaded. To switch quants, unload the current model with lms unload, download a new one, and re-load with the same identifier.

Verify the config works

Start OpenCode from any directory:

opencode

Inside the TUI, press Ctrl+P to open the command menu, then choose Switch model. Your models should appear by their display names. Select one with Enter, type a short test message, and hit Enter.

If you get a reply, you're fully configured.

Heads-up on CPU inference: Agentic tools like OpenCode ship a large system prompt (~10 000 tokens) that the model must process before it can reply. On a CPU this initial prompt evaluation can take a few minutes, even if you just say "hi." Once that first pass is done and the KV cache is populated, subsequent responses will be noticeably faster. So don't panic at the long time-to-first-token; it's expected. On a high-end GPU this same step takes a fraction of a second.

Troubleshooting

Symptom	Likely cause
Model doesn't appear in the list	Typo in a `provider` or `models` key. Copy-paste is your friend
"Connection refused" error	Inference server isn't running on that port. Start it first
Response cuts off early	Context window on the server is too small, or the model ran out of VRAM. Increase `-c` / `--context-length` and try again
Ollama model not found	Model in config hasn't been pulled yet. Run `ollama list` to check
LM Studio: `lms` command not found	PATH not set. Run `export PATH="$HOME/.lmstudio/bin:$PATH"` or restart your shell
LM Studio: model identifier mismatch	The identifier in your config must exactly match the `--identifier` used in `lms load`. Run `lms ps` to check

Part 4: Let's actually use it, build Flappy Bird in one prompt

Everything is installed, the server is running, OpenCode is configured. Time for the fun part: let's make a game.

We're going to ask OpenCode to build a Flappy Bird clone as a single HTML file (HTML + CSS + JavaScript, no build tools needed). You'll go from an empty folder to a playable game in about 60 seconds.

Step 1: Create a project folder and open OpenCode

mkdir ~/flappy-bird && cd ~/flappy-bird
opencode

OpenCode starts up and you'll see the TUI, a chat-style interface right in your terminal.

Step 2: Pick your model

Press Ctrl+P to open the main command menu, then choose Switch model. Use the arrow keys (or start typing) to find your ByteShape model, then hit Enter.

Step 3: Send the prompt

Type or paste the following into the chat input and hit Enter:

Build a Flappy Bird clone as a single index.html file. Use an HTML5 canvas
for rendering. Include:
- A bird that flaps on click or spacebar
- Scrolling pipes with a gap to fly through
- Gravity and simple collision detection
- A score counter that goes up for each pipe cleared
- A "Game Over" screen with a restart option
Make it colorful and fun. No external dependencies, everything in one file.

Now sit back and watch. OpenCode will write the file in real time, using tools under the hood, creating index.html, writing the JavaScript, and so on. Just let it cook.

Step 4: Play your game

Once OpenCode finishes, open the file in a browser:

open index.html

xdg-open index.html

Your game lives inside WSL2, but your browser runs on the Windows side. The easiest way to open it is with explorer.exe, which WSL can call directly:

explorer.exe index.html

This launches your default Windows browser with the file. If it doesn't work (some WSL setups need a full path), copy the file to your Windows desktop instead:

cp index.html /mnt/c/Users/$(cmd.exe /c "echo %USERNAME%" 2>/dev/null | tr -d '\r')/Desktop/
explorer.exe "C:\Users\$(cmd.exe /c "echo %USERNAME%" 2>/dev/null | tr -d '\r')\Desktop\index.html"

Alternative: You can also browse your WSL files directly from Windows Explorer. Open the Run dialog (Win+R), type \\wsl$\Ubuntu, and press Enter. Navigate to home › <your-username> › flappy-bird and double-click index.html.

Click the window (or press spacebar) and you're playing Flappy Bird. That whole thing, from empty folder to a working game, was one prompt.

Step 5: Iterate (the best part)

Don't like the colors? Want a night mode? Wish the pipes were shaped like cacti? Just tell OpenCode:

Make the background a night sky with stars. Change the bird to a little rocket
ship. Add a high score that persists across games using localStorage.

This is how agentic coding works: you describe what you want in plain language, and the agent makes the edits. You stay in control: review the changes, /undo if you don't like something, and keep iterating until it's perfect.

Cleanup: unloading LM Studio

When you're done coding, free the memory by unloading the model:

lms unload

This releases the model from memory but keeps LM Studio's daemon running. To fully shut down:

lms server stop
lms daemon off

Next time you want to use it, just start the daemon, load the model, and start the server again.

Yes, it built a Flappy Bird clone.

What on earth are we installing?

The architecture

What is OpenCode?

LM Studio vs llama.cpp vs Ollama — pick your vibe

Just give me the commands

Part 1: Set up your environment

First: Set up WSL2

Install Nvidia drivers on Linux

Nvidia GPU passthrough for WSL2

Install Ollama on macOS

Install Ollama on Linux

Install Ollama inside WSL2

Build llama.cpp on macOS

Build llama.cpp

Install LM Studio on macOS

Install LM Studio on Linux

Install LM Studio inside WSL2

Part 2: Download and run your ByteShape model

Which model file?

Pull and run with Ollama

Start llama-server (GPU)

Start llama-server (CPU)

Download and load with LM Studio

Part 3: Install OpenCode and configure your model

Install OpenCode

Configure OpenCode for Ollama

Configure OpenCode for llama.cpp

Configure OpenCode for LM Studio

Verify the config works

Troubleshooting

Part 4: Let's actually use it, build Flappy Bird in one prompt

Step 1: Create a project folder and open OpenCode

Step 2: Pick your model

Step 3: Send the prompt

Step 4: Play your game

Step 5: Iterate (the best part)

Cleanup: unloading LM Studio

Further reading