Z.ai: GLM-5

glm-5

chatZhipu

Quick Reference

Input: Text
Output: Text

Context: 198K
Max Output: 16.4K

Input Price: $0.59/M
Output Price: $2.65/M

Author: Zhipu AI
Version: main
Open Source: Yes

Overview

GLM-5 is a new-generation large model built for Coding and Agent scenarios, achieving open-source SOTA on complex systems engineering and long-horizon tasks, with real-world programming experience approaching the level of Claude Opus. Based on a new 744B foundation, asynchronous reinforcement learning, and sparse attention, it delivers a comprehensive upgrade from "writing code" to "writing engineering systems".

Input modalities

Text

Output modalities

Text

Capabilities

chat

Features

Function Calling

Structured Output

Caching

Batch Processing

Web Search

Pricing

Per-token prices for Z.ai: GLM-5.

Token Type	Price	Unit
Input	$0.59/M	per million tokens
Output	$2.65/M	per million tokens
Cache Read	$0.12/M	per million tokens

Specifications

Context Window

198Ktokens

Max Input

182Ktokens

Max Output

16.4Ktokens

API Reference

OpenAI-compatible endpoint at https://api.inferoute.ai/v1.

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.inferoute.ai/v1",
    api_key=os.environ.get("INFEROUTE_API_KEY"),
)

try:
    response = client.chat.completions.create(
        model="glm-5",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Write a haiku about recursion."},
        ],
        max_tokens=512,
        temperature=0.7,
    )

    print(response.choices[0].message.content)
except Exception as e:
    print(f"Error: {e}")