ChatML vs Harmony: Understanding the new Format from OpenAI 🔍

Community Article Published August 9, 2025

OpenAI just dropped their new Harmony format with the gpt-oss models, introducing a completely different approach to structuring reasoning and tool calls compared to the ChatML format we've been using with models like Qwen3.

If you're working with reasoning models or building inference infrastructure, understanding these formats is crucial. Today, we're diving deep into both formats with side-by-side comparisons to help you understand what's changed and why it matters!

What is ChatML? 📝

ChatML (Chat Markup Language) has been the go-to format for many open-source models, particularly the Qwen family. It's essentially an XML-inspired way to structure conversations that evolved from the need to clearly separate different parts of a conversation.

Key characteristics:

  • Uses special tokens <|im_start|> and <|im_end|> to mark message boundaries
  • Simple role-based structure (system, user, assistant)
  • Thinking/reasoning wrapped in <think>...</think> blocks
  • Tool definitions and calls use XML-style tags
  • Proven, battle-tested format used across many models

Think of ChatML as the "classic" approach - straightforward, XML-like, and focused on clarity.

What is Harmony? 🎭

Harmony is OpenAI's new response format designed specifically for the gpt-oss models. It's not just a prompt format - it's a complete rethinking of how models should structure their outputs, especially for complex reasoning and tool use.

Key innovations:

  • Multi-channel architecture: Messages can be tagged as analysis (reasoning), commentary (tool preambles), or final (user-facing)
  • Role hierarchy: system > developer > user > assistant > tool (for handling instruction conflicts)
  • Message routing: Uses to= syntax for directing messages between components
  • TypeScript-style tool definitions: More expressive and concise than JSON schemas
  • Separate safety standards per channel: The analysis channel isn't safety-filtered like final

Harmony treats the model's output like a multi-threaded conversation where different types of content flow through different channels.

Harmony's TypeScript-Like Syntax: Why It Matters 🚀

One of Harmony's most interesting design choices is using TypeScript-style definitions for tools instead of JSON schemas. This isn't just syntactic sugar - there's solid reasoning behind it.

The Power of Code-Based Definitions

As demonstrated by recent research on code agents, expressing actions in code has several advantages:

  1. Conciseness: Code actions are ~30% more compact than JSON
  2. Parallelism: Need 4 parallel streams of 5 actions? In JSON that's 20 separate blobs; in code it's one expression
  3. Variable management: Easy to store and reference results (rock_image = generate_image("rock"))
  4. Readability: Agent logs become much cleaner
  5. LLM fluency: Models have tons of code in their training data

Comparing Tool Definitions

ChatML (JSON Schema):

{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get weather for a city",
    "parameters": {
      "type": "object",
      "properties": {
        "city": {"type": "string"},
        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
      },
      "required": ["city"]
    }
  }
}

Harmony (TypeScript-style):

namespace functions {
  // Get weather for a city
  type get_weather = (_: {
    city: string,
    unit?: "celsius" | "fahrenheit", // default: celsius
  }) => any;
}

The TypeScript version is not only more concise but also more familiar to developers and easier for LLMs to work with.

Diving Into the Examples 🏊‍♂️

Let's see how these formats handle real-world scenarios, from simple chats to complex multi-step reasoning with tools.

1. Basic Conversation (No Thinking)

The simplest case - just asking a question without any reasoning steps.

ChatML (Qwen3)

<|im_start|>user
What's the capital of France?<|im_end|>
<|im_start|>assistant
<think>

</think>

The capital of France is Paris.<|im_end|>

image/png

OpenAI Harmony

<|start|>user<|message|>What's the capital of France?<|end|>
<|start|>assistant<|channel|>final<|message|>The capital of France is Paris.<|return|>

image/png

What's happening: Even with thinking disabled, Qwen3 includes empty <think></think> tags as a structural hint. Harmony is more direct - it jumps straight to the final channel for user-facing content.

2. Conversation with Thinking

Here's where the philosophical differences become clear.

ChatML (Qwen3)

<|im_start|>user
If I have 3 apples and buy 5 more, then give away 2, how many do I have?<|im_end|>
<|im_start|>assistant
<think>
Starting with 3 apples
Adding 5 more: 3 + 5 = 8
Giving away 2: 8 - 2 = 6
So the final answer is 6 apples.
</think>

You would have 6 apples. 

Here's the breakdown: You start with 3 apples, buy 5 more (3 + 5 = 8), then give away 2 (8 - 2 = 6).<|im_end|>

image/png

OpenAI Harmony

<|start|>user<|message|>If I have 3 apples and buy 5 more, then give away 2, how many do I have?<|end|>
<|start|>assistant<|channel|>analysis<|message|>User starts with 3 apples. Buys 5 more: 3 + 5 = 8. Gives away 2: 8 - 2 = 6. Final count is 6 apples.<|end|>
<|start|>assistant<|channel|>final<|message|>You would have 6 apples.

Here's the breakdown: You start with 3 apples, buy 5 more (3 + 5 = 8), then give away 2 (8 - 2 = 6).<|return|>

image/png

What's different:

  • ChatML keeps everything in one message with inline thinking
  • Harmony splits reasoning (analysis channel) and answer (final channel) into separate messages
  • Harmony's approach makes it easier to filter reasoning from user-facing content programmatically

3. Multi-Turn Conversation with Thinking

This reveals how each format handles conversation history.

ChatML (Qwen3)

<|im_start|>user
What's 15% of 80?<|im_end|>
<|im_start|>assistant
15% of 80 is 12.

To calculate: 80 × 0.15 = 12<|im_end|>
<|im_start|>user
Now add 25 to that result<|im_end|>
<|im_start|>assistant
<think>
The previous result was 12.
Adding 25: 12 + 25 = 37
</think>

Adding 25 to 12 gives us 37.<|im_end|>

image/png

Important: Qwen3 removes thinking from previous turns to save tokens!

OpenAI Harmony

<|start|>user<|message|>What's 15% of 80?<|end|>
<|start|>assistant<|channel|>analysis<|message|>Calculate 15% of 80: 80 × 0.15 = 12<|end|>
<|start|>assistant<|channel|>final<|message|>15% of 80 is 12.<|return|>
<|start|>user<|message|>Now add 25 to that result<|end|>
<|start|>assistant<|channel|>analysis<|message|>Previous result was 12. Adding 25: 12 + 25 = 37<|end|>
<|start|>assistant<|channel|>final<|message|>Adding 25 to 12 gives us 37.<|return|>

image/png

History management: Both formats prune historical reasoning, but Harmony's channel system makes this more explicit and controllable.

4. Function Calling

The formats diverge significantly in how they handle tools.

ChatML (Qwen3)

<|im_start|>system
# Tools

You may call one or more functions to assist with the user query.

<tools>
{"type": "function", "function": {"name": "get_weather", "description": "Get weather for a city", "parameters": {"type": "object", "properties": {"city": {"type": "string"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}}, "required": ["city"]}}}
</tools>

For each function call, return a json object within <tool_call></tool_call> XML tags.<|im_end|>
<|im_start|>user
What's the weather in Tokyo?<|im_end|>
<|im_start|>assistant
<think>
User wants Tokyo weather. Need to call get_weather function.
</think>

I'll check the current weather in Tokyo for you.

<tool_call>
{"name": "get_weather", "arguments": {"city": "Tokyo", "unit": "celsius"}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
{"temperature": 22, "condition": "partly cloudy", "humidity": 65}
</tool_response><|im_end|>
<|im_start|>assistant
The current weather in Tokyo is 22°C with partly cloudy skies and 65% humidity.<|im_end|>

image/png

OpenAI Harmony

<|start|>developer<|message|># Tools
## functions
namespace functions {
// Get weather for a city
type get_weather = (_: {
  city: string,
  unit?: "celsius" | "fahrenheit", // default: celsius
}) => any;
} // namespace functions<|end|>
<|start|>user<|message|>What's the weather in Tokyo?<|end|>
<|start|>assistant<|channel|>commentary<|message|>I'll check the current weather in Tokyo for you.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_weather<|constrain|>json<|message|>{"city": "Tokyo", "unit": "celsius"}<|call|>
<|start|>functions.get_weather to=assistant<|channel|>commentary<|message|>{"temperature": 22, "condition": "partly cloudy", "humidity": 65}<|end|>
<|start|>assistant<|channel|>final<|message|>The current weather in Tokyo is 22°C with partly cloudy skies and 65% humidity.<|return|>

image/png

Key differences:

  • Tool definitions: JSON schemas vs TypeScript-style types
  • Tool responses: ChatML wraps them as user messages; Harmony uses a dedicated tool role
  • Message routing: Harmony's to= syntax explicitly shows the flow between components
  • Channels: Harmony separates tool interactions (commentary) from final answers

5. Function Calling with Chain-of-Thought

Complex scenarios really showcase the architectural differences.

ChatML (Qwen3)

<|im_start|>system
# Tools
<tools>
{"type": "function", "function": {"name": "get_weather", "description": "Get weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}}}
{"type": "function", "function": {"name": "get_flight_status", "description": "Check flight status", "parameters": {"type": "object", "properties": {"flight_number": {"type": "string"}}}}}
</tools><|im_end|>
<|im_start|>user
Should I pack an umbrella for my UA902 flight to Tokyo?<|im_end|>
<|im_start|>assistant
<think>
User is asking about packing an umbrella for flight UA902 to Tokyo.
I need to:
1. Check the flight status to confirm it's going to Tokyo
2. Check Tokyo's weather to see if rain is expected
Let me start with the flight status.
</think>

I'll help you decide if you need an umbrella. Let me check your flight details and the weather in Tokyo.

<tool_call>
{"name": "get_flight_status", "arguments": {"flight_number": "UA902"}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
{"flight": "UA902", "destination": "Tokyo (NRT)", "status": "On time", "departure": "14:30"}
</tool_response><|im_end|>
<|im_start|>assistant
<think>
Flight confirmed to Tokyo, departing at 14:30. Now checking weather.
</think>

<tool_call>
{"name": "get_weather", "arguments": {"city": "Tokyo"}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
{"temperature": 18, "condition": "rainy", "precipitation": "80%"}
</tool_response><|im_end|>
<|im_start|>assistant
<think>
Tokyo weather shows rainy conditions with 80% precipitation. Definitely need an umbrella.
</think>

Yes, you should definitely pack an umbrella! Your UA902 flight to Tokyo is on time (departing at 14:30), and Tokyo is currently experiencing rainy weather with an 80% chance of precipitation. Better to be prepared! ☔<|im_end|>

image/png

OpenAI Harmony

<|start|>developer<|message|># Tools
## functions
namespace functions {
type get_weather = (_: {city: string}) => any;
type get_flight_status = (_: {flight_number: string}) => any;
} // namespace functions<|end|>
<|start|>user<|message|>Should I pack an umbrella for my UA902 flight to Tokyo?<|end|>
<|start|>assistant<|channel|>analysis<|message|>User asking about packing umbrella for UA902 to Tokyo. Need to: 1) Verify flight destination 2) Check Tokyo weather. Starting with flight status.<|end|>
<|start|>assistant<|channel|>commentary<|message|>I'll help you decide if you need an umbrella. Let me check your flight details and the weather in Tokyo.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_flight_status<|constrain|>json<|message|>{"flight_number": "UA902"}<|call|>
<|start|>functions.get_flight_status to=assistant<|channel|>commentary<|message|>{"flight": "UA902", "destination": "Tokyo (NRT)", "status": "On time", "departure": "14:30"}<|end|>
<|start|>assistant<|channel|>analysis<|message|>Flight confirmed to Tokyo, on time at 14:30. Now checking weather conditions.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_weather<|constrain|>json<|message|>{"city": "Tokyo"}<|call|>
<|start|>functions.get_weather to=assistant<|channel|>commentary<|message|>{"temperature": 18, "condition": "rainy", "precipitation": "80%"}<|end|>
<|start|>assistant<|channel|>analysis<|message|>Tokyo has rainy conditions with 80% precipitation. Clear recommendation to pack umbrella.<|end|>
<|start|>assistant<|channel|>final<|message|>Yes, you should definitely pack an umbrella! Your UA902 flight to Tokyo is on time (departing at 14:30), and Tokyo is currently experiencing rainy weather with an 80% chance of precipitation. Better to be prepared! ☔<|return|>

image/png

In complex scenarios:

  • ChatML preserves thinking across tool calls within the same conversation turn
  • Harmony explicitly tracks each reasoning step with separate analysis messages
  • Harmony's commentary channel can include user-facing "preambles" before tool execution
  • The message routing (to=) in Harmony creates a clear execution trace

What This Means for the Ecosystem 🌍

The introduction of Harmony represents a significant evolution in how we think about model outputs:

  1. Channel-based safety: Different safety standards for different channels is a game-changer for reasoning models
  2. Better observability: Explicit routing and channels make it easier to debug complex agent behaviors
  3. Code-first tools: TypeScript syntax could become the new standard for tool definitions
  4. Structured reasoning: Separating thinking from output at the format level, not just convention

Key Takeaways 📌

  • ChatML is simpler and more established, with broad ecosystem support
  • Harmony introduces powerful new concepts like channels and message routing
  • Both formats handle reasoning and tools, but with very different philosophies
  • Harmony's TypeScript-style definitions are more concise and developer-friendly
  • The channel system in Harmony enables fine-grained control over what users see

Understanding both formats is crucial as the ecosystem evolves. While we can't choose which format a model uses (we can, but only when we train in base model), knowing how they work helps us build better inference infrastructure and get the most out of these powerful reasoning models.

Community

Sign up or log in to comment