GLM 5.2 Easy Setup

GLM 5.2 is a cheap, open model, MIT licensed, with a real 1M-token context window. For most people, the best setup right now is GLM 5.2 on Baseten, inside Factory (Droid). You get the full 1M context, fast US-hosted inference, and you can let your agent do the setup for you. That's the fast path below, and it's where you should start.

If you want a different editor, or you'd rather use one key across all of them, there's an OpenRouter path further down that works in Claude Code, Droid, and Codex.

One honest caveat up front, because it saves you a wasted day: GLM 5.2 is text only.

Setting this up with an AI agent? Point it at this page and tell it your editor and your operating system. For the fast path, tell Droid to follow the Baseten section (or Factory's own doc at https://docs.factory.ai/cli/byok/baseten) and hand it your Baseten key.

The fast path: GLM 5.2 on Baseten in Factory (start here)

Why this one. Baseten serves GLM 5.2 directly with the full 1M-token context window. It's US-hosted with very low time-to-first-token, so it feels fast. And GLM 5.2 is currently the leading open-weights model for the money. For most workflows, this is the setup you want.

The model slug on Baseten is zai-org/GLM-5.2. Model page: https://www.baseten.co/library/glm-52/

Step 1: Get a Baseten key

Open the API keys page in settings and create a key.

Add a little credit. Reference pricing is around $1.40 per million input tokens and $4.40 per million output, far below frontier models. Check the model page for current numbers.

Optional but recommended: store the key in your shell so you never paste it into a config file.


# add to ~/.zshrc or ~/.bashrc
export BASETEN_API_KEY="your-baseten-key-here"

Restart your terminal after editing your shell profile.

Step 2: Let the agent set it up (or paste it yourself)

The fast way: open Droid and tell it to set up GLM 5.2 from Baseten, pointing it at Factory's own doc at https://docs.factory.ai/cli/byok/baseten, and give it your Baseten key. It reads the doc and writes the config for you. You are using the agent to configure the agent.

If you'd rather do it by hand, add this to ~/.factory/settings.json:


{
  "customModels": [
    {
      "model": "zai-org/GLM-5.2",
      "displayName": "GLM 5.2 [Baseten]",
      "baseUrl": "https://inference.baseten.co/v1",
      "apiKey": "YOUR_BASETEN_API_KEY",
      "provider": "generic-chat-completion-api",
      "maxOutputTokens": 131072
    }
  ]
}

Two things worth knowing:

The provider is generic-chat-completion-api because Baseten's endpoint is OpenAI-compatible.

maxOutputTokens of 131072 matches GLM 5.2's output cap. The 1M is the input context window, and it comes from the model and the Baseten endpoint, so you don't have to set it.

The ${BASETEN_API_KEY} shell reference works here too if you'd rather not paste the raw key into the file.

Step 3: Pick it and test

Run droid, type /model, pick GLM 5.2 [Baseten], and send something small to prove you're really on it and connected.


Write a Python function that reverses a linked list, then explain it in two sentences.

If you get a clean answer, you're set. Now point it at your actual code.

No key needed: Droid Core (if you're on a Factory plan)

If you have a Factory plan, you can skip the key entirely. Droid Core is Factory's managed open-model tier, and it includes a GLM option built in. Open /model and pick the Droid Core GLM entry. Confirm the version label in your client, since the lineup updates over time. The Baseten path above is the one to use when you want the full 1M and the raw speed without a plan.

Other editors, or one key across all of them (OpenRouter)

If you're in Claude Code or Codex, or you'd rather use a single key everywhere, OpenRouter picks a host for you and fails over automatically, so one key just works. The model slug here is z-ai/glm-5.2. Model page: https://openrouter.ai/z-ai/glm-5.2

Get your OpenRouter key

Open Keys and create a new key (it starts with sk-or-).

Add a little credit under Billing.


# add to ~/.zshrc or ~/.bashrc
export OPENROUTER_API_KEY="sk-or-your-key-here"

Restart your terminal after editing your shell profile.

Claude Code

Add these to your ~/.zshrc or ~/.bashrc (or to an env block in .claude/settings.local.json if you prefer to scope it per project; Claude Code does not read a plain .env).


export OPENROUTER_API_KEY="sk-or-your-key-here"
export ANTHROPIC_BASE_URL="https://openrouter.ai/api"
export ANTHROPIC_AUTH_TOKEN="$OPENROUTER_API_KEY"
export ANTHROPIC_API_KEY=""   # must be empty, not unset
export ANTHROPIC_DEFAULT_SONNET_MODEL="z-ai/glm-5.2"

Restart your terminal, run claude, and check /status. You want the base URL to read https://openrouter.ai/api and the auth token to read ANTHROPIC_AUTH_TOKEN.

Things to get right:

ANTHROPIC_API_KEY has to be an empty string. If it's unset, or it still holds an old Anthropic key, Claude Code can quietly authenticate against Anthropic instead of OpenRouter.

If you logged into Anthropic before, run /logout inside Claude Code, then quit and relaunch.

The full 1M-context variant (glm-5.2[1m]) and the auto-compact tuning are features of Z.ai's own Coding Plan endpoint, not the OpenRouter slug. For most work the standard context is plenty. If you specifically need the full million-token window inside Claude Code, that's the one case where a Z.ai Coding Plan key beats the one-key OpenRouter path. (Or use the Baseten path above, which serves the full 1M.)

Droid (Factory) via OpenRouter

The recommended Droid setup is the Baseten fast path above. Use this only if you want one OpenRouter key across several editors.


{
  "customModels": [
    {
      "model": "z-ai/glm-5.2",
      "displayName": "GLM 5.2 (OpenRouter)",
      "baseUrl": "https://openrouter.ai/api/v1",
      "apiKey": "${OPENROUTER_API_KEY}",
      "provider": "generic-chat-completion-api",
      "maxOutputTokens": 131072
    }
  ]
}

Run droid, type /model, and pick GLM 5.2 from the Custom models section.

Codex CLI

Edit your user-level config at ~/.codex/config.toml.


model = "z-ai/glm-5.2"
model_provider = "openrouter"
model_reasoning_effort = "high"

[model_providers.openrouter]
name = "OpenRouter"
base_url = "https://openrouter.ai/api/v1"
env_key = "OPENROUTER_API_KEY"
wire_api = "responses"

Make sure OPENROUTER_API_KEY is set in your environment, then run codex. A few things to get right (Codex is the strictest):

The model slug must be exact, including the z-ai/ prefix. A bare glm-5.2 will not match.

model_provider and model_providers only work in your user-level ~/.codex/config.toml. Put them there, not in a project-level .codex/config.toml, or Codex ignores them and prints a warning on startup.

wire_api = "responses" is required for a custom provider, which is already in the block above.

The gotchas nobody warns you about

This is the part that turns a quick setup into a lost afternoon if you don't know it going in.

Images don't work (it's text only)

GLM 5.2 is text in, text out. There is no vision. If you paste a screenshot or a mockup, it gets dropped or errors, on every provider, because it's the model, not your config. Three ways to handle it:

Keep a vision-capable model configured alongside GLM 5.2 and switch with /model when you need to look at an image.

Describe the image in words and let GLM 5.2 work from the description.

For screenshot-heavy or design-to-code work, GLM 5.2 is the wrong tool. Use it for the text and reasoning, and bring in a vision model for the pixels.

Quantization (fp8 vs fp4)

Different hosts serve the same model at different precision. The cheapest routes often serve fp4, which is smaller and cheaper. Others (Baseten, Z.ai, Fireworks, Novita) serve fp8. For coding the quality gap is usually small but real, and it's why the price differs between hosts. The Baseten fast path above runs fp8. If your results feel off on a cheaper route, pin an fp8 provider and compare on your own task.

Thinking effort (High vs Max)

GLM 5.2 has two thinking levels, High and Max. Max thinks harder and costs more time and tokens; High is faster. In Codex you set this with model_reasoning_effort. Start on High for everyday work and bump to Max when a task is genuinely hard.

Going further: other hosts and US data residency

Baseten is the default recommendation above: US-hosted, fp8, and fast. If you want to compare or steer to a different host, OpenRouter lets you set provider preferences, and most of these also expose an OpenAI-compatible endpoint you can point an editor at directly:

Together AI: fast, US-hosted

Fireworks AI: US-hosted, fp8, day-zero support

DeepInfra: cheapest, US-hosted, fp4

US hosting is a real concern if data residency matters to you, since Z.ai's own API is hosted in China. To pin a host, set provider preferences in your OpenRouter account, or point your editor straight at that provider's OpenAI-compatible endpoint with that provider's own key, using the same config pattern above (swap the base URL, key, and model name).

When to reach for GLM 5.2 (and when not to)

Reach for it when you want a cheap, open model for long-horizon coding, when you need a big context window for whole-repo work, or when you want to run agentic tasks without frontier-model pricing.

Skip it when your workflow is screenshot-driven (no vision), or when you need the absolute top of the benchmark for the hardest reasoning. It's the strongest open option for the money, not a guaranteed number one for everything.

Keep going

That's the whole setup. A Baseten key, Factory, a few minutes, and you let the agent do the work.

If you want the deeper walkthrough, watch the stream where I set this up live, and come build with us.

Free community and more guides: https://rfer.me/startmyai

X: https://x.com/RayFernando1337

Prices and providers move fast. Check the Baseten model page at https://www.baseten.co/library/glm-52/ for current numbers before you commit.