Z.ai: GLM-5.1
glm-5.1
chatZhipu
Quick Reference
- Input
- Text
- Output
- Text
- Context
- 202K
- Max Output
- 131K
- Input Price
- $0.88/M
- Output Price
- $3.53/M
- Author
- Zhipu AI
- Version
- main
- Open Source
- Yes
Overview
GLM-5.1 is Zhipu AI's model designed for Long Horizon Tasks, featuring 744B total parameters, supporting 200K ultra-long context and up to 128K output tokens. It offers powerful logical reasoning, long-text understanding, and code generation, balancing performance and inference efficiency; it performs excellently across multi-task benchmarks and suits intelligent interaction, enterprise applications, and developer assistance.
Input modalities
Text
Output modalities
Text
Capabilities
chat
Features
Function Calling
Structured Output
Caching
Batch Processing
Web Search
Pricing
Per-token prices for Z.ai: GLM-5.1.
| Token Type | Price | Unit |
|---|---|---|
| Input | $0.88/M | per million tokens |
| Output | $3.53/M | per million tokens |
| Cache Read | $0.09/M | per million tokens |
Specifications
Context Window
202Ktokens
Max Input
71Ktokens
Max Output
131Ktokens
API Reference
OpenAI-compatible endpoint at https://api.inferoute.ai/v1.
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.inferoute.ai/v1",
api_key=os.environ.get("INFEROUTE_API_KEY"),
)
try:
response = client.chat.completions.create(
model="glm-5.1",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a haiku about recursion."},
],
max_tokens=512,
temperature=0.7,
)
print(response.choices[0].message.content)
except Exception as e:
print(f"Error: {e}")