13 models across 5 providers. Route through one OpenAI-compatible endpoint.
DeepSeek: DeepSeek V4 Flash
DeepSeek
deepseek-v4-flash
1M
393K
$0.14/M
$0.28/M
$0.0028/M
Lightweight and efficient MoE model with 284B total parameters and 13B activated parameters, natively supporting million-token ultra-long context. Offers fast inference, low latency, and low call cost with balanced overall capability. Targeted at high-concurrency, lightweight tasks and well suited for everyday conversation, content creation, basic RAG, and batch copy processing in mainstream cost-sensitive scenarios.
DeepSeek: DeepSeek V4 Pro
DeepSeek
deepseek-v4-pro
1M
393K
$0.435/M
$0.87/M
$0.003625/M
Flagship MoE large model with 1.6T total parameters and 49B activated parameters, natively supporting million-token ultra-long context. Backed by massive high-quality training data, it delivers top-tier mathematical logic, complex reasoning, professional coding, and deep long-text comprehension—well suited for advanced research, complex office workflows, and deep intelligent agent scenarios.
Qwen: Qwen3.5 Flash
Qwen
qwen3.5-flash
1M
65.5K
$0.03/M
$0.29/M
$0.0029/M
Qwen3.5 native vision-language Flash model, built on a hybrid architecture combining linear attention and a sparse mixture-of-experts design for higher inference efficiency. Both pure-text and multimodal performance leap forward versus the 3 series; delivers fast response while balancing inference speed and performance.
Qwen: Qwen3.5 Plus
Qwen
qwen3.5-plus
1M
65.5K
$0.12/M
$0.71/M
$0.01/M
Qwen3.5 native vision-language Plus model, built on a hybrid architecture combining linear attention and a sparse mixture-of-experts design for higher inference efficiency. Across multiple benchmarks, the 3.5 series delivers exceptional performance comparable to today's leading frontier models, with major leaps over the 3 series in both pure text and multimodal scenarios. This version is functionally equivalent to the snapshot model qwen3.5-plus-2026-02-15.
Qwen: Qwen3.6 Flash
Qwen
qwen3.6-flash
1M
65.5K
$0.18/M
$1.06/M
$0.02/M
Qwen3.6 native vision-language Flash model, with significantly improved performance over 3.5-Flash. This model focuses on enhancing agentic coding capabilities (substantially surpassing previous generations on multiple code-agent benchmarks), mathematical reasoning, and code reasoning; on the vision side, spatial intelligence is markedly strengthened, with especially notable gains in object localization and detection.
Qwen: Qwen3.6 Plus
Qwen
qwen3.6-plus
1M
65.5K
$0.29/M
$1.76/M
$0.03/M
Qwen3.6 native vision-language Plus model, delivering exceptional performance comparable to today's leading frontier models, with significantly improved results over the 3.5 series. The model is markedly enhanced in agentic coding, front-end programming, and vibe coding, as well as in multimodal universal recognition, OCR, and object localization. This version is functionally equivalent to the snapshot model qwen3.6-plus-2026-04-02.
Qwen: Qwen3.7 Max
Qwen
qwen3.7-max
1M
65.5K
$1.25/M
$3.75/M
$0.25/M
The largest and most capable Max model in the Qwen3.7 series, currently opened with pure-text capabilities for early experience. Qwen3.7 is a new-generation flagship model built for the agent era, with core advantages in the breadth and depth of agent capabilities—excelling in programming, office and productivity tasks, and long-horizon autonomous execution. This version is functionally equivalent to the snapshot model qwen3.7-max-2026-05-20.
MiniMax: MiniMax M2.5
MiniMax
minimax-m2.5
200K
131K
$0.31/M
$1.24/M
$0.03/M
SOTA for the agent world. Purpose-built for Agent 2.0, it extends coding into real-world workspaces, entertainment, and personal assistance. A global SOTA open-source coding and agent model: SWE-bench Pro and SWE-bench Verified scores surpass Opus 4.6; global SOTA on Excel, search & research, and document summarization; lightning fast with optimized thinking efficiency at 100+ TPS, delivering 3x the speed of Opus; extreme cost-performance to power always-on agents.
MiniMax: MiniMax M2.7
MiniMax
minimax-m2.7
200K
131K
$0.31/M
$1.24/M
$0.06/M
M2.7 can autonomously build complex Agent Harnesses and tackle highly sophisticated productivity tasks through Agent Teams, complex Skills, and Tool Search.
Moonshot: Kimi K2.5
Moonshot
kimi-k2.5
262K
16.4K
$0.59/M
$3.09/M
$0.06/M
kimi-k2.5 is Moonshot's most versatile model to date, featuring a native multimodal architecture that simultaneously supports vision and text input, thinking and non-thinking modes, and both conversational and Agent tasks.
Moonshot: Kimi K2.6
Moonshot
kimi-k2.6
262K
16.4K
$0.96/M
$3.97/M
$0.1/M
kimi-k2.6 is Kimi's latest and most intelligent model, with stronger and more stable long-horizon code authoring, and significantly improved instruction following and self-correction. It supports text, image, and video input, thinking and non-thinking modes, and both conversational and Agent tasks.
Z.ai: GLM-5
Zhipu
glm-5
198K
16.4K
$0.59/M
$2.65/M
$0.12/M
GLM-5 is a new-generation large model built for Coding and Agent scenarios, achieving open-source SOTA on complex systems engineering and long-horizon tasks, with real-world programming experience approaching the level of Claude Opus. Based on a new 744B foundation, asynchronous reinforcement learning, and sparse attention, it delivers a comprehensive upgrade from "writing code" to "writing engineering systems".
Z.ai: GLM-5.1
Zhipu
glm-5.1
202K
131K
$0.88/M
$3.53/M
$0.09/M
GLM-5.1 is Zhipu AI's model designed for Long Horizon Tasks, featuring 744B total parameters, supporting 200K ultra-long context and up to 128K output tokens. It offers powerful logical reasoning, long-text understanding, and code generation, balancing performance and inference efficiency; it performs excellently across multi-task benchmarks and suits intelligent interaction, enterprise applications, and developer assistance.