4 Structured Output

Emily Johnson

-Mar 12, 2026, 6:45 PM

Structured outputs constrain Claude's responses to follow a specific schema, ensuring valid, parseable output for downstream processing. Two complementary features are available: These features can be used independently or together in the same request. Structured outputs are generally available on the Claude API and Amazon Bedrock for Claude Opus 4.6, Claude Sonnet 4.6, Claude Sonnet 4.5, Claude Opus 4.5, and Claude Haiku 4.5. Structured outputs are in public beta on Microsoft Foundry. Prompts and responses using structured outputs are processed with Zero Data Retention (ZDR).

However, the JSON schema itself is temporarily cached for up to 24 hours for optimization purposes. No prompt or response data is retained. Migrating from beta? The output_format parameter has moved to output_config.format, and beta headers are no longer required. The old beta header (structured-outputs-2025-11-13) and output_format parameter will continue working for a transition period. See code examples below for the updated API shape.

JSON is one of the most widely used formats in the world for applications to exchange data. Structured Outputs is a feature that ensures the model will always generate responses that adhere to your supplied JSON Schema, so you don’t need to worry about the model omitting a required key, or... Some benefits of Structured Outputs include: In addition to supporting JSON Schema in the REST API, the OpenAI SDKs for Python and JavaScript also make it easy to define object schemas using Pydantic and Zod respectively. Below, you can see how to extract information from unstructured text that conforms to a schema defined in code. Structured Outputs is available in our latest large language models, starting with GPT-4o.

Older models like gpt-4-turbo and earlier may use JSON mode instead. In limits, there is freedom. Creativity thrives within structure. While Language Models excel at generating human-like text, they face challenges when tasked with producing structured output in a consistent manner [Shorten et al., 2024, Tang et al., 2024]. This limitation becomes particularly problematic when integrating LLMs into production systems that require well-formatted data for downstream processing through databases, APIs, or other software applications. Even carefully crafted prompts cannot guarantee that an LLM will maintain the expected structure throughout its response.

But what user needs drive the demand for LLM output constraints? In a recent work by Google Research [Liu et al., 2024], the authors explored the user need for constraints on the output of large language models, drawing on a survey of 51 industry professionals... User needs can be broadly categorized as follows: 1. Improving Developer Efficiency and Workflow Reducing Trial and Error in Prompt Engineering: Developers find the process of crafting prompts to elicit desired output formats to be time-consuming, often involving extensive testing and iteration.

LLM output constraints could make this process more efficient and predictable. About LLM structured output, it seems like everything has been said already. We’ve got JSON schemas, Pydantic models, and enough Medium articles to fill a small library. Yet here’s what rarely gets mentioned at AI conferences (where a MacBook plastered with brain illustration stickers is practically the entry fee): you can use examples to force LLMs to shape their responses to... Few-shot examples represent a powerful alternative to model fine-tuning that eliminates the need for parameter optimization, dataset curation, and computational overhead. They provide immediate control over model behavior without requiring infrastructure changes or additional training cycles.

Examples serve as behavioral constraints that guide the model toward desired output patterns while maintaining inference efficiency. This approach proves particularly valuable in production environments where consistency and predictability are essential — because nothing ruins your day quite like an AI that decides to get creative with your API responses. It’s not only about the right format, what if you can show the model how you expect the reasoning behind the output (like chain-of-thought)? Instead of just demanding the right JSON structure, you’re essentially saying, “Hey, I want to see your mental math too.” It’s like having a transparent AI that can’t help but explain how it arrived... Field-like examples provide granular control over individual output components. Complete object examples demonstrate full-context relationships between fields and establish structural patterns for complex outputs.

Structured Outputs is a feature that lets the API return responses in a specific, organized format, like JSON or other schemas you define. Instead of getting free-form text, you receive data that's consistent and easy to parse. Ideal for tasks like document parsing, entity extraction, or report generation, it lets you define schemas using tools like Pydantic or Zod to enforce data types, constraints, and structure. When using structured outputs, the LLM's response is guaranteed to match your input schema. Structured outputs is supported by all language models. For structured output, the following types are supported for structured output:

Get reliable JSON from any LLM using structured outputs, JSON mode, Pydantic, Instructor, and Outlines. Complete production guide with OpenAI, Claude, and Gemini code examples for consistent data extraction. Get reliable JSON from any LLM using structured outputs, JSON mode, Pydantic, Instructor, and Outlines. Complete production guide with OpenAI, Claude, and Gemini code examples for consistent data extraction. Agenta is the open-source LLMOps platform: prompt management, evals, and LLM observability all in one place. LLMs excel at creative tasks.

They write code, summarize documents, and draft emails with impressive results. But ask for structured JSON and you get inconsistent formats, malformed syntax, and unpredictable field names. The problem gets worse in production. A prompt that works perfectly in testing starts failing after a model update. Your JSON parser breaks on unexpected field types. Your application crashes because the LLM decided to rename "status" to "current_state" without warning.

How can you ensure your large language models (LLMs) consistently produce outputs in the correct format - especially when performing tasks or calling APIs? What can you do to prevent your LLM from generating unpredictable or incomplete responses that can break your application? LLMs often produce unstructured outputs that require extensive post-processing before they can be used effectively. This unpredictability leads to errors, wasted time, and increased costs. OpenAI and Google introduced structured outputs to solve this problem. Structured outputs ensure model responses follow a strict format, reduce errors and make it easier to integrate LLMs into applications requiring consistent, machine-readable data.

This guide will explain how structured outputs work, how they can be implemented in OpenAI and Gemini models, the benefits they offer, and what challenges you could face when using them. Structured outputs ensure model-generated responses follow pre-defined formats, such as JSON, XML or Markdown. Previously, LLMs would generate responses in free-form text that doesn’t have a specific structure. Instead, structured outputs provide an alternative where outputs are machine readable, consistent, and can be easily integrated into other systems. Access to this page requires authorization. You can try signing in or changing directories.

Access to this page requires authorization. You can try changing directories. Structured outputs make a model follow a JSON Schema definition that you provide as part of your inference API call. This is in contrast to the older JSON mode feature, which guaranteed valid JSON would be generated, but was unable to ensure strict adherence to the supplied schema. Structured outputs are recommended for function calling, extracting structured data, and building complex multi-step workflows. You can use Pydantic to define object schemas in Python.

Depending on what version of the OpenAI and Pydantic libraries you're running you might need to upgrade to a newer version. These examples were tested against openai 1.42.0 and pydantic 2.8.2. If you are new to using Microsoft Entra ID for authentication see How to configure Azure OpenAI in Microsoft Foundry Models with Microsoft Entra ID authentication.

4 Structured Output

People Also Search

Structured Outputs Constrain Claude's Responses To Follow A Specific Schema,

However, The JSON Schema Itself Is Temporarily Cached For Up

JSON Is One Of The Most Widely Used Formats In

Older Models Like Gpt-4-turbo And Earlier May Use JSON Mode

But What User Needs Drive The Demand For LLM Output