Z.ai: GLM-5
glm-5
chatZhipu
Quick Reference
- Input
- Text
- Output
- Text
- Context
- 198K
- Max Output
- 16.4K
- Input Price
- $0.59/M
- Output Price
- $2.65/M
- Author
- Zhipu AI
- Version
- main
- Open Source
- Yes
Overview
GLM-5 is a new-generation large model built for Coding and Agent scenarios, achieving open-source SOTA on complex systems engineering and long-horizon tasks, with real-world programming experience approaching the level of Claude Opus. Based on a new 744B foundation, asynchronous reinforcement learning, and sparse attention, it delivers a comprehensive upgrade from "writing code" to "writing engineering systems".
Input modalities
Text
Output modalities
Text
Capabilities
chat
Features
Function Calling
Structured Output
Caching
Batch Processing
Web Search
Pricing
Per-token prices for Z.ai: GLM-5.
| Token Type | Price | Unit |
|---|---|---|
| Input | $0.59/M | per million tokens |
| Output | $2.65/M | per million tokens |
| Cache Read | $0.12/M | per million tokens |
Specifications
Context Window
198Ktokens
Max Input
182Ktokens
Max Output
16.4Ktokens
API Reference
OpenAI-compatible endpoint at https://api.inferoute.ai/v1.
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.inferoute.ai/v1",
api_key=os.environ.get("INFEROUTE_API_KEY"),
)
try:
response = client.chat.completions.create(
model="glm-5",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a haiku about recursion."},
],
max_tokens=512,
temperature=0.7,
)
print(response.choices[0].message.content)
except Exception as e:
print(f"Error: {e}")