Multi-model routing
Every agent reaches frontier models from Anthropic, OpenAI, xAI, and Cerebras behind one interface. Each call goes to the model that fits the task — quality, latency, or cost — so the platform follows the frontier as it moves rather than being wired to whichever vendor was integrated first. Picking a model is part of defining an agent, not a buried platform setting. Give a research agent a frontier reasoning model and a triage agent a fast, small one, and run them side by side in the same workspace.A model per agent
Give each agent the model that fits its job, and run a fleet where every member is on a different model at once.
Automatic failover
If a provider errors or goes down, the call retries against a healthy one inside the same request.
Own-GPU inference for sensitive work
There is a quieter problem underneath model choice: most agent platforms ship your data to a third-party API just to embed or rerank it — the highest-volume calls they make. HQ answers this by serving the cheap, constant, and data-sensitive paths on hardware it runs itself.Open models we serve ourselves
Open-weight models run on our own GPUs, so the high-volume and data-sensitive paths never have to leave our infrastructure.
Embeddings, in-house
The vectors behind memory and search come from an embedding model we host, so indexed text is never sent to an outside API to be turned into numbers.
Reranking, in-house
A self-hosted reranking model does the final relevance pass on memory and corpus lookups, with no third-party round-trip.
A batch path at fleet scale
Large non-interactive jobs run through a batch lane for work that can wait rather than answer in the moment.
Independent of the runtime
The model and the runtime are two separate dials. Each agent runs on a vendor’s official agent loop — the Claude Agent SDK, the OpenAI Agents SDK, or the open OpenCode runtime — and any model can run under any of those runtimes, so each choice is made on its own merits.The model layer sits inside qOS next to search, cache, and storage, sharing the same hardware, isolation, and audit trail as everything else an agent touches. Adding or swapping a model is a config change, with nothing to alter in the agents on top.