Model Catalog

13 models across 5 providers. Route through one OpenAI-compatible endpoint.

13 models

DeepSeek: DeepSeek V4 Flash

DeepSeek

deepseek-v4-flash

Context

1M

Max output

393K

Input

$0.14/M

Output

$0.28/M

Cache

$0.0028/M

chat

Lightweight and efficient MoE model with 284B total parameters and 13B activated parameters, natively supporting million-token ultra-long context. Offers fast inference, low latency, and low call cost with balanced overall capability. Targeted at high-concurrency, lightweight tasks and well suited for everyday conversation, content creation, basic RAG, and batch copy processing in mainstream cost-sensitive scenarios.

View details

DeepSeek: DeepSeek V4 Pro

DeepSeek

deepseek-v4-pro

Context

1M

Max output

393K

Input

$0.435/M

Output

$0.87/M

Cache

$0.003625/M

chat

Flagship MoE large model with 1.6T total parameters and 49B activated parameters, natively supporting million-token ultra-long context. Backed by massive high-quality training data, it delivers top-tier mathematical logic, complex reasoning, professional coding, and deep long-text comprehension—well suited for advanced research, complex office workflows, and deep intelligent agent scenarios.

View details

Qwen: Qwen3.5 Flash

Qwen

qwen3.5-flash

Context

1M

Max output

65.5K

Input

$0.03/M

Output

$0.29/M

Cache

$0.0029/M

chatreasoningvision

Qwen3.5 native vision-language Flash model, built on a hybrid architecture combining linear attention and a sparse mixture-of-experts design for higher inference efficiency. Both pure-text and multimodal performance leap forward versus the 3 series; delivers fast response while balancing inference speed and performance.

View details

Qwen: Qwen3.5 Plus

Qwen

qwen3.5-plus

Context

1M

Max output

65.5K

Input

$0.12/M

Output

$0.71/M

Cache

$0.01/M

chatreasoningvision

Qwen3.5 native vision-language Plus model, built on a hybrid architecture combining linear attention and a sparse mixture-of-experts design for higher inference efficiency. Across multiple benchmarks, the 3.5 series delivers exceptional performance comparable to today's leading frontier models, with major leaps over the 3 series in both pure text and multimodal scenarios. This version is functionally equivalent to the snapshot model qwen3.5-plus-2026-02-15.

View details

Qwen: Qwen3.6 Flash

Qwen

qwen3.6-flash

Context

1M

Max output

65.5K

Input

$0.18/M

Output

$1.06/M

Cache

$0.02/M

chatreasoningvision

Qwen3.6 native vision-language Flash model, with significantly improved performance over 3.5-Flash. This model focuses on enhancing agentic coding capabilities (substantially surpassing previous generations on multiple code-agent benchmarks), mathematical reasoning, and code reasoning; on the vision side, spatial intelligence is markedly strengthened, with especially notable gains in object localization and detection.

View details

Qwen: Qwen3.6 Plus

Qwen

qwen3.6-plus

Context

1M

Max output

65.5K

Input

$0.29/M

Output

$1.76/M

Cache

$0.03/M

chatvision

Qwen3.6 native vision-language Plus model, delivering exceptional performance comparable to today's leading frontier models, with significantly improved results over the 3.5 series. The model is markedly enhanced in agentic coding, front-end programming, and vibe coding, as well as in multimodal universal recognition, OCR, and object localization. This version is functionally equivalent to the snapshot model qwen3.6-plus-2026-04-02.

View details

Qwen: Qwen3.7 Max

Qwen

qwen3.7-max

Context

1M

Max output

65.5K

Input

$1.25/M

Output

$3.75/M

Cache

$0.25/M

chatreasoning

The largest and most capable Max model in the Qwen3.7 series, currently opened with pure-text capabilities for early experience. Qwen3.7 is a new-generation flagship model built for the agent era, with core advantages in the breadth and depth of agent capabilities—excelling in programming, office and productivity tasks, and long-horizon autonomous execution. This version is functionally equivalent to the snapshot model qwen3.7-max-2026-05-20.

View details

MiniMax: MiniMax M2.5

MiniMax

minimax-m2.5

Context

200K

Max output

131K

Input

$0.31/M

Output

$1.24/M

Cache

$0.03/M

chatreasoning

SOTA for the agent world. Purpose-built for Agent 2.0, it extends coding into real-world workspaces, entertainment, and personal assistance. A global SOTA open-source coding and agent model: SWE-bench Pro and SWE-bench Verified scores surpass Opus 4.6; global SOTA on Excel, search & research, and document summarization; lightning fast with optimized thinking efficiency at 100+ TPS, delivering 3x the speed of Opus; extreme cost-performance to power always-on agents.

View details

MiniMax: MiniMax M2.7

MiniMax

minimax-m2.7

Context

200K

Max output

131K

Input

$0.31/M

Output

$1.24/M

Cache

$0.06/M

chatreasoning

M2.7 can autonomously build complex Agent Harnesses and tackle highly sophisticated productivity tasks through Agent Teams, complex Skills, and Tool Search.

View details

Moonshot: Kimi K2.5

Moonshot

kimi-k2.5

Context

262K

Max output

16.4K

Input

$0.59/M

Output

$3.09/M

Cache

$0.06/M

chatreasoningvision

kimi-k2.5 is Moonshot's most versatile model to date, featuring a native multimodal architecture that simultaneously supports vision and text input, thinking and non-thinking modes, and both conversational and Agent tasks.

View details

Moonshot: Kimi K2.6

Moonshot

kimi-k2.6

Context

262K

Max output

16.4K

Input

$0.96/M

Output

$3.97/M

Cache

$0.1/M

chatreasoningvision

kimi-k2.6 is Kimi's latest and most intelligent model, with stronger and more stable long-horizon code authoring, and significantly improved instruction following and self-correction. It supports text, image, and video input, thinking and non-thinking modes, and both conversational and Agent tasks.

View details

Z.ai: GLM-5

Zhipu

glm-5

Context

198K

Max output

16.4K

Input

$0.59/M

Output

$2.65/M

Cache

$0.12/M

chat

GLM-5 is a new-generation large model built for Coding and Agent scenarios, achieving open-source SOTA on complex systems engineering and long-horizon tasks, with real-world programming experience approaching the level of Claude Opus. Based on a new 744B foundation, asynchronous reinforcement learning, and sparse attention, it delivers a comprehensive upgrade from "writing code" to "writing engineering systems".

View details

Z.ai: GLM-5.1

Zhipu

glm-5.1

Context

202K

Max output

131K

Input

$0.88/M

Output

$3.53/M

Cache

$0.09/M

chat

GLM-5.1 is Zhipu AI's model designed for Long Horizon Tasks, featuring 744B total parameters, supporting 200K ultra-long context and up to 128K output tokens. It offers powerful logical reasoning, long-text understanding, and code generation, balancing performance and inference efficiency; it performs excellently across multi-task benchmarks and suits intelligent interaction, enterprise applications, and developer assistance.

View details