ChatML vs Harmony: Understanding the new Format from OpenAI 🔍
OpenAI just dropped their new Harmony format with the gpt-oss models, introducing a completely different approach to structuring reasoning and tool calls compared to the ChatML format we've been using with models like Qwen3.
If you're working with reasoning models or building inference infrastructure, understanding these formats is crucial. Today, we're diving deep into both formats with side-by-side comparisons to help you understand what's changed and why it matters!
What is ChatML? 📝
ChatML (Chat Markup Language) has been the go-to format for many open-source models, particularly the Qwen family. It's essentially an XML-inspired way to structure conversations that evolved from the need to clearly separate different parts of a conversation.
Key characteristics:
- Uses special tokens
<|im_start|>
and<|im_end|>
to mark message boundaries - Simple role-based structure (system, user, assistant)
- Thinking/reasoning wrapped in
<think>...</think>
blocks - Tool definitions and calls use XML-style tags
- Proven, battle-tested format used across many models
Think of ChatML as the "classic" approach - straightforward, XML-like, and focused on clarity.
What is Harmony? 🎭
Harmony is OpenAI's new response format designed specifically for the gpt-oss models. It's not just a prompt format - it's a complete rethinking of how models should structure their outputs, especially for complex reasoning and tool use.
Key innovations:
- Multi-channel architecture: Messages can be tagged as
analysis
(reasoning),commentary
(tool preambles), orfinal
(user-facing) - Role hierarchy: system > developer > user > assistant > tool (for handling instruction conflicts)
- Message routing: Uses
to=
syntax for directing messages between components - TypeScript-style tool definitions: More expressive and concise than JSON schemas
- Separate safety standards per channel: The
analysis
channel isn't safety-filtered likefinal
Harmony treats the model's output like a multi-threaded conversation where different types of content flow through different channels.
Harmony's TypeScript-Like Syntax: Why It Matters 🚀
One of Harmony's most interesting design choices is using TypeScript-style definitions for tools instead of JSON schemas. This isn't just syntactic sugar - there's solid reasoning behind it.
The Power of Code-Based Definitions
As demonstrated by recent research on code agents, expressing actions in code has several advantages:
- Conciseness: Code actions are ~30% more compact than JSON
- Parallelism: Need 4 parallel streams of 5 actions? In JSON that's 20 separate blobs; in code it's one expression
- Variable management: Easy to store and reference results (
rock_image = generate_image("rock")
) - Readability: Agent logs become much cleaner
- LLM fluency: Models have tons of code in their training data
Comparing Tool Definitions
ChatML (JSON Schema):
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["city"]
}
}
}
Harmony (TypeScript-style):
namespace functions {
// Get weather for a city
type get_weather = (_: {
city: string,
unit?: "celsius" | "fahrenheit", // default: celsius
}) => any;
}
The TypeScript version is not only more concise but also more familiar to developers and easier for LLMs to work with.
Diving Into the Examples 🏊♂️
Let's see how these formats handle real-world scenarios, from simple chats to complex multi-step reasoning with tools.
1. Basic Conversation (No Thinking)
The simplest case - just asking a question without any reasoning steps.
ChatML (Qwen3)
<|im_start|>user
What's the capital of France?<|im_end|>
<|im_start|>assistant
<think>
</think>
The capital of France is Paris.<|im_end|>
OpenAI Harmony
<|start|>user<|message|>What's the capital of France?<|end|>
<|start|>assistant<|channel|>final<|message|>The capital of France is Paris.<|return|>
What's happening: Even with thinking disabled, Qwen3 includes empty <think></think>
tags as a structural hint. Harmony is more direct - it jumps straight to the final
channel for user-facing content.
2. Conversation with Thinking
Here's where the philosophical differences become clear.
ChatML (Qwen3)
<|im_start|>user
If I have 3 apples and buy 5 more, then give away 2, how many do I have?<|im_end|>
<|im_start|>assistant
<think>
Starting with 3 apples
Adding 5 more: 3 + 5 = 8
Giving away 2: 8 - 2 = 6
So the final answer is 6 apples.
</think>
You would have 6 apples.
Here's the breakdown: You start with 3 apples, buy 5 more (3 + 5 = 8), then give away 2 (8 - 2 = 6).<|im_end|>
OpenAI Harmony
<|start|>user<|message|>If I have 3 apples and buy 5 more, then give away 2, how many do I have?<|end|>
<|start|>assistant<|channel|>analysis<|message|>User starts with 3 apples. Buys 5 more: 3 + 5 = 8. Gives away 2: 8 - 2 = 6. Final count is 6 apples.<|end|>
<|start|>assistant<|channel|>final<|message|>You would have 6 apples.
Here's the breakdown: You start with 3 apples, buy 5 more (3 + 5 = 8), then give away 2 (8 - 2 = 6).<|return|>
What's different:
- ChatML keeps everything in one message with inline thinking
- Harmony splits reasoning (
analysis
channel) and answer (final
channel) into separate messages - Harmony's approach makes it easier to filter reasoning from user-facing content programmatically
3. Multi-Turn Conversation with Thinking
This reveals how each format handles conversation history.
ChatML (Qwen3)
<|im_start|>user
What's 15% of 80?<|im_end|>
<|im_start|>assistant
15% of 80 is 12.
To calculate: 80 × 0.15 = 12<|im_end|>
<|im_start|>user
Now add 25 to that result<|im_end|>
<|im_start|>assistant
<think>
The previous result was 12.
Adding 25: 12 + 25 = 37
</think>
Adding 25 to 12 gives us 37.<|im_end|>
Important: Qwen3 removes thinking from previous turns to save tokens!
OpenAI Harmony
<|start|>user<|message|>What's 15% of 80?<|end|>
<|start|>assistant<|channel|>analysis<|message|>Calculate 15% of 80: 80 × 0.15 = 12<|end|>
<|start|>assistant<|channel|>final<|message|>15% of 80 is 12.<|return|>
<|start|>user<|message|>Now add 25 to that result<|end|>
<|start|>assistant<|channel|>analysis<|message|>Previous result was 12. Adding 25: 12 + 25 = 37<|end|>
<|start|>assistant<|channel|>final<|message|>Adding 25 to 12 gives us 37.<|return|>
History management: Both formats prune historical reasoning, but Harmony's channel system makes this more explicit and controllable.
4. Function Calling
The formats diverge significantly in how they handle tools.
ChatML (Qwen3)
<|im_start|>system
# Tools
You may call one or more functions to assist with the user query.
<tools>
{"type": "function", "function": {"name": "get_weather", "description": "Get weather for a city", "parameters": {"type": "object", "properties": {"city": {"type": "string"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}}, "required": ["city"]}}}
</tools>
For each function call, return a json object within <tool_call></tool_call> XML tags.<|im_end|>
<|im_start|>user
What's the weather in Tokyo?<|im_end|>
<|im_start|>assistant
<think>
User wants Tokyo weather. Need to call get_weather function.
</think>
I'll check the current weather in Tokyo for you.
<tool_call>
{"name": "get_weather", "arguments": {"city": "Tokyo", "unit": "celsius"}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
{"temperature": 22, "condition": "partly cloudy", "humidity": 65}
</tool_response><|im_end|>
<|im_start|>assistant
The current weather in Tokyo is 22°C with partly cloudy skies and 65% humidity.<|im_end|>
OpenAI Harmony
<|start|>developer<|message|># Tools
## functions
namespace functions {
// Get weather for a city
type get_weather = (_: {
city: string,
unit?: "celsius" | "fahrenheit", // default: celsius
}) => any;
} // namespace functions<|end|>
<|start|>user<|message|>What's the weather in Tokyo?<|end|>
<|start|>assistant<|channel|>commentary<|message|>I'll check the current weather in Tokyo for you.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_weather<|constrain|>json<|message|>{"city": "Tokyo", "unit": "celsius"}<|call|>
<|start|>functions.get_weather to=assistant<|channel|>commentary<|message|>{"temperature": 22, "condition": "partly cloudy", "humidity": 65}<|end|>
<|start|>assistant<|channel|>final<|message|>The current weather in Tokyo is 22°C with partly cloudy skies and 65% humidity.<|return|>
Key differences:
- Tool definitions: JSON schemas vs TypeScript-style types
- Tool responses: ChatML wraps them as user messages; Harmony uses a dedicated tool role
- Message routing: Harmony's
to=
syntax explicitly shows the flow between components - Channels: Harmony separates tool interactions (
commentary
) from final answers
5. Function Calling with Chain-of-Thought
Complex scenarios really showcase the architectural differences.
ChatML (Qwen3)
<|im_start|>system
# Tools
<tools>
{"type": "function", "function": {"name": "get_weather", "description": "Get weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}}}
{"type": "function", "function": {"name": "get_flight_status", "description": "Check flight status", "parameters": {"type": "object", "properties": {"flight_number": {"type": "string"}}}}}
</tools><|im_end|>
<|im_start|>user
Should I pack an umbrella for my UA902 flight to Tokyo?<|im_end|>
<|im_start|>assistant
<think>
User is asking about packing an umbrella for flight UA902 to Tokyo.
I need to:
1. Check the flight status to confirm it's going to Tokyo
2. Check Tokyo's weather to see if rain is expected
Let me start with the flight status.
</think>
I'll help you decide if you need an umbrella. Let me check your flight details and the weather in Tokyo.
<tool_call>
{"name": "get_flight_status", "arguments": {"flight_number": "UA902"}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
{"flight": "UA902", "destination": "Tokyo (NRT)", "status": "On time", "departure": "14:30"}
</tool_response><|im_end|>
<|im_start|>assistant
<think>
Flight confirmed to Tokyo, departing at 14:30. Now checking weather.
</think>
<tool_call>
{"name": "get_weather", "arguments": {"city": "Tokyo"}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
{"temperature": 18, "condition": "rainy", "precipitation": "80%"}
</tool_response><|im_end|>
<|im_start|>assistant
<think>
Tokyo weather shows rainy conditions with 80% precipitation. Definitely need an umbrella.
</think>
Yes, you should definitely pack an umbrella! Your UA902 flight to Tokyo is on time (departing at 14:30), and Tokyo is currently experiencing rainy weather with an 80% chance of precipitation. Better to be prepared! ☔<|im_end|>
OpenAI Harmony
<|start|>developer<|message|># Tools
## functions
namespace functions {
type get_weather = (_: {city: string}) => any;
type get_flight_status = (_: {flight_number: string}) => any;
} // namespace functions<|end|>
<|start|>user<|message|>Should I pack an umbrella for my UA902 flight to Tokyo?<|end|>
<|start|>assistant<|channel|>analysis<|message|>User asking about packing umbrella for UA902 to Tokyo. Need to: 1) Verify flight destination 2) Check Tokyo weather. Starting with flight status.<|end|>
<|start|>assistant<|channel|>commentary<|message|>I'll help you decide if you need an umbrella. Let me check your flight details and the weather in Tokyo.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_flight_status<|constrain|>json<|message|>{"flight_number": "UA902"}<|call|>
<|start|>functions.get_flight_status to=assistant<|channel|>commentary<|message|>{"flight": "UA902", "destination": "Tokyo (NRT)", "status": "On time", "departure": "14:30"}<|end|>
<|start|>assistant<|channel|>analysis<|message|>Flight confirmed to Tokyo, on time at 14:30. Now checking weather conditions.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_weather<|constrain|>json<|message|>{"city": "Tokyo"}<|call|>
<|start|>functions.get_weather to=assistant<|channel|>commentary<|message|>{"temperature": 18, "condition": "rainy", "precipitation": "80%"}<|end|>
<|start|>assistant<|channel|>analysis<|message|>Tokyo has rainy conditions with 80% precipitation. Clear recommendation to pack umbrella.<|end|>
<|start|>assistant<|channel|>final<|message|>Yes, you should definitely pack an umbrella! Your UA902 flight to Tokyo is on time (departing at 14:30), and Tokyo is currently experiencing rainy weather with an 80% chance of precipitation. Better to be prepared! ☔<|return|>
In complex scenarios:
- ChatML preserves thinking across tool calls within the same conversation turn
- Harmony explicitly tracks each reasoning step with separate
analysis
messages - Harmony's
commentary
channel can include user-facing "preambles" before tool execution - The message routing (
to=
) in Harmony creates a clear execution trace
What This Means for the Ecosystem 🌍
The introduction of Harmony represents a significant evolution in how we think about model outputs:
- Channel-based safety: Different safety standards for different channels is a game-changer for reasoning models
- Better observability: Explicit routing and channels make it easier to debug complex agent behaviors
- Code-first tools: TypeScript syntax could become the new standard for tool definitions
- Structured reasoning: Separating thinking from output at the format level, not just convention
Key Takeaways 📌
- ChatML is simpler and more established, with broad ecosystem support
- Harmony introduces powerful new concepts like channels and message routing
- Both formats handle reasoning and tools, but with very different philosophies
- Harmony's TypeScript-style definitions are more concise and developer-friendly
- The channel system in Harmony enables fine-grained control over what users see
Understanding both formats is crucial as the ecosystem evolves. While we can't choose which format a model uses (we can, but only when we train in base model), knowing how they work helps us build better inference infrastructure and get the most out of these powerful reasoning models.