Z.ai: GLM-5.1

glm-5.1
chatZhipu

Quick Reference

Input
Text
Output
Text
Context
202K
Max Output
131K
Input Price
$0.88/M
Output Price
$3.53/M
Author
Zhipu AI
Version
main
Open Source
Yes

Overview

GLM-5.1 is Zhipu AI's model designed for Long Horizon Tasks, featuring 744B total parameters, supporting 200K ultra-long context and up to 128K output tokens. It offers powerful logical reasoning, long-text understanding, and code generation, balancing performance and inference efficiency; it performs excellently across multi-task benchmarks and suits intelligent interaction, enterprise applications, and developer assistance.

Input modalities

Text

Output modalities

Text

Capabilities

chat

Features

Function Calling
Structured Output
Caching
Batch Processing
Web Search

Pricing

Per-token prices for Z.ai: GLM-5.1.

Token TypePriceUnit
Input$0.88/Mper million tokens
Output$3.53/Mper million tokens
Cache Read$0.09/Mper million tokens

Specifications

Context Window

202Ktokens

Max Input

71Ktokens

Max Output

131Ktokens

API Reference

OpenAI-compatible endpoint at https://api.inferoute.ai/v1.

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.inferoute.ai/v1",
    api_key=os.environ.get("INFEROUTE_API_KEY"),
)

try:
    response = client.chat.completions.create(
        model="glm-5.1",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Write a haiku about recursion."},
        ],
        max_tokens=512,
        temperature=0.7,
    )

    print(response.choices[0].message.content)
except Exception as e:
    print(f"Error: {e}")