ChatML vs Harmony: Understanding the new Format from OpenAI 🔍

Community Article Published August 9, 2025

OpenAI just dropped their new Harmony format with the gpt-oss models, introducing a completely different approach to structuring reasoning and tool calls compared to the ChatML format we've been using with models like Qwen3.

If you're working with reasoning models or building inference infrastructure, understanding these formats is crucial. Today, we're diving deep into both formats with side-by-side comparisons to help you understand what's changed and why it matters!

What is ChatML? 📝

ChatML (Chat Markup Language) has been the go-to format for many open-source models, particularly the Qwen family. It's essentially an XML-inspired way to structure conversations that evolved from the need to clearly separate different parts of a conversation.

Key characteristics:

Uses special tokens <|im_start|> and <|im_end|> to mark message boundaries
Simple role-based structure (system, user, assistant)
Thinking/reasoning wrapped in <think>...</think> blocks
Tool definitions and calls use XML-style tags
Proven, battle-tested format used across many models

Think of ChatML as the "classic" approach - straightforward, XML-like, and focused on clarity.

What is Harmony? 🎭

Harmony is OpenAI's new response format designed specifically for the gpt-oss models. It's not just a prompt format - it's a complete rethinking of how models should structure their outputs, especially for complex reasoning and tool use.

Key innovations:

Multi-channel architecture: Messages can be tagged as analysis (reasoning), commentary (tool preambles), or final (user-facing)
Role hierarchy: system > developer > user > assistant > tool (for handling instruction conflicts)
Message routing: Uses to= syntax for directing messages between components
TypeScript-style tool definitions: More expressive and concise than JSON schemas
Separate safety standards per channel: The analysis channel isn't safety-filtered like final

Harmony treats the model's output like a multi-threaded conversation where different types of content flow through different channels.

Harmony's TypeScript-Like Syntax: Why It Matters 🚀

One of Harmony's most interesting design choices is using TypeScript-style definitions for tools instead of JSON schemas. This isn't just syntactic sugar - there's solid reasoning behind it.

The Power of Code-Based Definitions

As demonstrated by recent research on code agents, expressing actions in code has several advantages:

Conciseness: Code actions are ~30% more compact than JSON
Parallelism: Need 4 parallel streams of 5 actions? In JSON that's 20 separate blobs; in code it's one expression
Variable management: Easy to store and reference results (rock_image = generate_image("rock"))
Readability: Agent logs become much cleaner
LLM fluency: Models have tons of code in their training data

Comparing Tool Definitions

ChatML (JSON Schema):

{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get weather for a city",
    "parameters": {
      "type": "object",
      "properties": {
        "city": {"type": "string"},
        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
      },
      "required": ["city"]
    }
  }
}

Harmony (TypeScript-style):

namespace functions {
  // Get weather for a city
  type get_weather = (_: {
    city: string,
    unit?: "celsius" | "fahrenheit", // default: celsius
  }) => any;
}

The TypeScript version is not only more concise but also more familiar to developers and easier for LLMs to work with.

Diving Into the Examples 🏊‍♂️

Let's see how these formats handle real-world scenarios, from simple chats to complex multi-step reasoning with tools.

1. Basic Conversation (No Thinking)

The simplest case - just asking a question without any reasoning steps.

ChatML (Qwen3)

<|im_start|>user
What's the capital of France?<|im_end|>
<|im_start|>assistant
<think>

</think>

The capital of France is Paris.<|im_end|>

OpenAI Harmony

<|start|>user<|message|>What's the capital of France?<|end|>
<|start|>assistant<|channel|>final<|message|>The capital of France is Paris.<|return|>

What's happening: Even with thinking disabled, Qwen3 includes empty <think></think> tags as a structural hint. Harmony is more direct - it jumps straight to the final channel for user-facing content.

2. Conversation with Thinking

Here's where the philosophical differences become clear.

ChatML (Qwen3)

<|im_start|>user
If I have 3 apples and buy 5 more, then give away 2, how many do I have?<|im_end|>
<|im_start|>assistant
<think>
Starting with 3 apples
Adding 5 more: 3 + 5 = 8
Giving away 2: 8 - 2 = 6
So the final answer is 6 apples.
</think>

You would have 6 apples. 

Here's the breakdown: You start with 3 apples, buy 5 more (3 + 5 = 8), then give away 2 (8 - 2 = 6).<|im_end|>

OpenAI Harmony

<|start|>user<|message|>If I have 3 apples and buy 5 more, then give away 2, how many do I have?<|end|>
<|start|>assistant<|channel|>analysis<|message|>User starts with 3 apples. Buys 5 more: 3 + 5 = 8. Gives away 2: 8 - 2 = 6. Final count is 6 apples.<|end|>
<|start|>assistant<|channel|>final<|message|>You would have 6 apples.

Here's the breakdown: You start with 3 apples, buy 5 more (3 + 5 = 8), then give away 2 (8 - 2 = 6).<|return|>

What's different:

ChatML keeps everything in one message with inline thinking
Harmony splits reasoning (analysis channel) and answer (final channel) into separate messages
Harmony's approach makes it easier to filter reasoning from user-facing content programmatically

3. Multi-Turn Conversation with Thinking

This reveals how each format handles conversation history.

ChatML (Qwen3)

<|im_start|>user
What's 15% of 80?<|im_end|>
<|im_start|>assistant
15% of 80 is 12.

To calculate: 80 × 0.15 = 12<|im_end|>
<|im_start|>user
Now add 25 to that result<|im_end|>
<|im_start|>assistant
<think>
The previous result was 12.
Adding 25: 12 + 25 = 37
</think>

Adding 25 to 12 gives us 37.<|im_end|>

Important: Qwen3 removes thinking from previous turns to save tokens!

OpenAI Harmony

<|start|>user<|message|>What's 15% of 80?<|end|>
<|start|>assistant<|channel|>analysis<|message|>Calculate 15% of 80: 80 × 0.15 = 12<|end|>
<|start|>assistant<|channel|>final<|message|>15% of 80 is 12.<|return|>
<|start|>user<|message|>Now add 25 to that result<|end|>
<|start|>assistant<|channel|>analysis<|message|>Previous result was 12. Adding 25: 12 + 25 = 37<|end|>
<|start|>assistant<|channel|>final<|message|>Adding 25 to 12 gives us 37.<|return|>

History management: Both formats prune historical reasoning, but Harmony's channel system makes this more explicit and controllable.

4. Function Calling

The formats diverge significantly in how they handle tools.

ChatML (Qwen3)

<|im_start|>system
# Tools

You may call one or more functions to assist with the user query.

<tools>
{"type": "function", "function": {"name": "get_weather", "description": "Get weather for a city", "parameters": {"type": "object", "properties": {"city": {"type": "string"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}}, "required": ["city"]}}}
</tools>

For each function call, return a json object within <tool_call></tool_call> XML tags.<|im_end|>
<|im_start|>user
What's the weather in Tokyo?<|im_end|>
<|im_start|>assistant
<think>
User wants Tokyo weather. Need to call get_weather function.
</think>

I'll check the current weather in Tokyo for you.

<tool_call>
{"name": "get_weather", "arguments": {"city": "Tokyo", "unit": "celsius"}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
{"temperature": 22, "condition": "partly cloudy", "humidity": 65}
</tool_response><|im_end|>
<|im_start|>assistant
The current weather in Tokyo is 22°C with partly cloudy skies and 65% humidity.<|im_end|>

OpenAI Harmony

<|start|>developer<|message|># Tools
## functions
namespace functions {
// Get weather for a city
type get_weather = (_: {
  city: string,
  unit?: "celsius" | "fahrenheit", // default: celsius
}) => any;
} // namespace functions<|end|>
<|start|>user<|message|>What's the weather in Tokyo?<|end|>
<|start|>assistant<|channel|>commentary<|message|>I'll check the current weather in Tokyo for you.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_weather<|constrain|>json<|message|>{"city": "Tokyo", "unit": "celsius"}<|call|>
<|start|>functions.get_weather to=assistant<|channel|>commentary<|message|>{"temperature": 22, "condition": "partly cloudy", "humidity": 65}<|end|>
<|start|>assistant<|channel|>final<|message|>The current weather in Tokyo is 22°C with partly cloudy skies and 65% humidity.<|return|>

Key differences:

Tool definitions: JSON schemas vs TypeScript-style types
Tool responses: ChatML wraps them as user messages; Harmony uses a dedicated tool role
Message routing: Harmony's to= syntax explicitly shows the flow between components
Channels: Harmony separates tool interactions (commentary) from final answers

5. Function Calling with Chain-of-Thought

Complex scenarios really showcase the architectural differences.

ChatML (Qwen3)

<|im_start|>system
# Tools
<tools>
{"type": "function", "function": {"name": "get_weather", "description": "Get weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}}}
{"type": "function", "function": {"name": "get_flight_status", "description": "Check flight status", "parameters": {"type": "object", "properties": {"flight_number": {"type": "string"}}}}}
</tools><|im_end|>
<|im_start|>user
Should I pack an umbrella for my UA902 flight to Tokyo?<|im_end|>
<|im_start|>assistant
<think>
User is asking about packing an umbrella for flight UA902 to Tokyo.
I need to:
1. Check the flight status to confirm it's going to Tokyo
2. Check Tokyo's weather to see if rain is expected
Let me start with the flight status.
</think>

I'll help you decide if you need an umbrella. Let me check your flight details and the weather in Tokyo.

<tool_call>
{"name": "get_flight_status", "arguments": {"flight_number": "UA902"}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
{"flight": "UA902", "destination": "Tokyo (NRT)", "status": "On time", "departure": "14:30"}
</tool_response><|im_end|>
<|im_start|>assistant
<think>
Flight confirmed to Tokyo, departing at 14:30. Now checking weather.
</think>

<tool_call>
{"name": "get_weather", "arguments": {"city": "Tokyo"}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
{"temperature": 18, "condition": "rainy", "precipitation": "80%"}
</tool_response><|im_end|>
<|im_start|>assistant
<think>
Tokyo weather shows rainy conditions with 80% precipitation. Definitely need an umbrella.
</think>

Yes, you should definitely pack an umbrella! Your UA902 flight to Tokyo is on time (departing at 14:30), and Tokyo is currently experiencing rainy weather with an 80% chance of precipitation. Better to be prepared! ☔<|im_end|>

OpenAI Harmony

<|start|>developer<|message|># Tools
## functions
namespace functions {
type get_weather = (_: {city: string}) => any;
type get_flight_status = (_: {flight_number: string}) => any;
} // namespace functions<|end|>
<|start|>user<|message|>Should I pack an umbrella for my UA902 flight to Tokyo?<|end|>
<|start|>assistant<|channel|>analysis<|message|>User asking about packing umbrella for UA902 to Tokyo. Need to: 1) Verify flight destination 2) Check Tokyo weather. Starting with flight status.<|end|>
<|start|>assistant<|channel|>commentary<|message|>I'll help you decide if you need an umbrella. Let me check your flight details and the weather in Tokyo.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_flight_status<|constrain|>json<|message|>{"flight_number": "UA902"}<|call|>
<|start|>functions.get_flight_status to=assistant<|channel|>commentary<|message|>{"flight": "UA902", "destination": "Tokyo (NRT)", "status": "On time", "departure": "14:30"}<|end|>
<|start|>assistant<|channel|>analysis<|message|>Flight confirmed to Tokyo, on time at 14:30. Now checking weather conditions.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_weather<|constrain|>json<|message|>{"city": "Tokyo"}<|call|>
<|start|>functions.get_weather to=assistant<|channel|>commentary<|message|>{"temperature": 18, "condition": "rainy", "precipitation": "80%"}<|end|>
<|start|>assistant<|channel|>analysis<|message|>Tokyo has rainy conditions with 80% precipitation. Clear recommendation to pack umbrella.<|end|>
<|start|>assistant<|channel|>final<|message|>Yes, you should definitely pack an umbrella! Your UA902 flight to Tokyo is on time (departing at 14:30), and Tokyo is currently experiencing rainy weather with an 80% chance of precipitation. Better to be prepared! ☔<|return|>

In complex scenarios:

ChatML preserves thinking across tool calls within the same conversation turn
Harmony explicitly tracks each reasoning step with separate analysis messages
Harmony's commentary channel can include user-facing "preambles" before tool execution
The message routing (to=) in Harmony creates a clear execution trace

What This Means for the Ecosystem 🌍

The introduction of Harmony represents a significant evolution in how we think about model outputs:

Channel-based safety: Different safety standards for different channels is a game-changer for reasoning models
Better observability: Explicit routing and channels make it easier to debug complex agent behaviors
Code-first tools: TypeScript syntax could become the new standard for tool definitions
Structured reasoning: Separating thinking from output at the format level, not just convention

Key Takeaways 📌

ChatML is simpler and more established, with broad ecosystem support
Harmony introduces powerful new concepts like channels and message routing
Both formats handle reasoning and tools, but with very different philosophies
Harmony's TypeScript-style definitions are more concise and developer-friendly
The channel system in Harmony enables fine-grained control over what users see

Understanding both formats is crucial as the ecosystem evolves. While we can't choose which format a model uses (we can, but only when we train in base model), knowing how they work helps us build better inference infrastructure and get the most out of these powerful reasoning models.

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote