Understand Model Context Protocol (MCP) sampling

The Model Context Protocol (MCP) provides a standardized way for servers to request LLM sampling ("completions" or "generations") from language models via clients. This flow allows clients to maintain control over model access, selection, and permissions while enabling servers to leverage AI capabilities—with no server API keys necessary. Servers can request text, audio, or image-based interactions and optionally include context from MCP servers in their prompts.

Protocol Revision: 2025-11-25

User Interaction Model

Sampling in MCP allows servers to implement agentic behaviors, by enabling LLM calls to occur nested inside other MCP server features.

Implementations are free to expose sampling through any interface pattern that suits their needs—the protocol itself does not mandate any specific user interaction model.

For trust & safety and security, there SHOULD always be a human in the loop with the ability to deny sampling requests.

Applications SHOULD:

Provide UI that makes it easy and intuitive to review sampling requests
Allow users to view and edit prompts before sending
Present generated responses for review before delivery

Tools in Sampling

Servers can request that the client's LLM use tools during sampling by providing a tools array and optional toolChoice configuration in their sampling requests. This enables servers to implement agentic behaviors where the LLM can call tools, receive results, and continue the conversation - all within a single sampling request flow.

Clients MUST declare support for tool use via the sampling.tools capability to receive tool-enabled sampling requests. Servers MUST NOT send tool-enabled sampling requests to Clients that have not declared support for tool use via the sampling.tools capability.

Capabilities

Clients that support sampling MUST declare the sampling capability during initialization:

Basic sampling:

{
  "capabilities": {
    "sampling": {}
  }
}

With tool use support:

{
  "capabilities": {
    "sampling": {
      "tools": {}
    }
  }
}

With context inclusion support (soft-deprecated):

{
  "capabilities": {
    "sampling": {
      "context": {}
    }
  }
}

The includeContext parameter values "thisServer" and "allServers" are soft-deprecated. Servers SHOULD avoid using these values (e.g. can just omit includeContext since it defaults to "none"), and SHOULD NOT use them unless the client declares sampling.context capability. These values may be removed in future spec releases.

Protocol Messages

Creating Messages

To request a language model generation, servers send a sampling/createMessage request:

Request:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "sampling/createMessage",
  "params": {
    "messages": [
      {
        "role": "user",
        "content": {
          "type": "text",
          "text": "What is the capital of France?"
        }
      }
    ],
    "modelPreferences": {
      "hints": [
        {
          "name": "claude-3-sonnet"
        }
      ],
      "intelligencePriority": 0.8,
      "speedPriority": 0.5
    },
    "systemPrompt": "You are a helpful assistant.",
    "maxTokens": 100
  }
}

Response:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "role": "assistant",
    "content": {
      "type": "text",
      "text": "The capital of France is Paris."
    },
    "model": "claude-3-sonnet-20240307",
    "stopReason": "endTurn"
  }
}

Sampling with Tools

The following diagram illustrates the complete flow of sampling with tools, including the multi-turn tool loop:

sequenceDiagram participant Server participant Client participant User participant LLM Note over Server,Client: Initial request with tools Server->>Client: sampling/createMessage
(messages + tools) Note over Client,User: Human-in-the-loop review Client->User: Present request for approval User-->>Client: Approve/modify Client->LLM: Forward request with tools LLM-->>Client: Response with tool_use
(stopReason: "toolUse") Client->User: Present tool calls for review User-->>Client: Approve tool calls Client-->>Server: Return tool_use response Note over Server: Execute tool(s) Server->Server: Run get_weather("Paris")
Run get_weather("London") Note over Server,Client: Continue with tool results Server->Client: sampling/createMessage
(history + tool_results + tools) Client->User: Present continuation User-->>Client: Approve Client->LLM: Forward with tool results LLM-->>Client: Final text response
(stopReason: "endTurn") Client->User: Present response User-->>Client: Approve Client-->>Server: Return final response Note over Server: Server processes result
(may continue conversation...)

To request LLM generation with tool use capabilities, servers include tools and optionally toolChoice in the request:

Request (Server -> Client):

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "sampling/createMessage",
  "params": {
    "messages": [
      {
        "role": "user",
        "content": {
          "type": "text",
          "text": "What's the weather like in Paris and London?"
        }
      }
    ],
    "tools": [
      {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "inputSchema": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string",
              "description": "City name"
            }
          },
          "required": [ "city" ]
        }
      }
    ],
    "toolChoice": {
      "mode": "auto"
    },
    "maxTokens": 1000
  }
}

Response (Client -> Server):

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "role": "assistant",
    "content": [
      {
        "type": "tool_use",
        "id": "call_abc123",
        "name": "get_weather",
        "input": {
          "city": "Paris"
        }
      },
      {
        "type": "tool_use",
        "id": "call_def456",
        "name": "get_weather",
        "input": {
          "city": "London"
        }
      }
    ],
    "model": "claude-3-sonnet-20240307",
    "stopReason": "toolUse"
  }
}

Multi-turn Tool Loop

After receiving tool use requests from the LLM, the server typically:

Executes the requested tool uses.
Sends a new sampling request with the tool results appended
Receives the LLM's response (which might contain new tool uses)
Repeats as many times as needed (server might cap the maximum number of iterations, and e.g. pass toolChoice: {mode: "none"} on the last iteration to force a final result)

Follow-up request (Server -> Client) with tool results:

{
  "jsonrpc": "2.0",
  "id": 2,
  "method": "sampling/createMessage",
  "params": {
    "messages": [
      {
        "role": "user",
        "content": {
          "type": "text",
          "text": "What's the weather like in Paris and London?"
        }
      },
      {
        "role": "assistant",
        "content": [
          {
            "type": "tool_use",
            "id": "call_abc123",
            "name": "get_weather",
            "input": { "city": "Paris" }
          },
          {
            "type": "tool_use",
            "id": "call_def456",
            "name": "get_weather",
            "input": { "city": "London" }
          }
        ]
      },
      {
        "role": "user",
        "content": [
          {
            "type": "tool_result",
            "toolUseId": "call_abc123",
            "content": [
              {
                "type": "text",
                "text": "Weather in Paris: 18°C, partly cloudy"
              }
            ]
          },
          {
            "type": "tool_result",
            "toolUseId": "call_def456",
            "content": [
              {
                "type": "text",
                "text": "Weather in London: 15°C, rainy"
              }
            ]
          }
        ]
      }
    ],
    "tools": [
      {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "inputSchema": {
          "type": "object",
          "properties": {
            "city": { "type": "string" }
          },
          "required": [ "city" ]
        }
      }
    ],
    "maxTokens": 1000
  }
}

Final response (Client -> Server):

{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "role": "assistant",
    "content": {
      "type": "text",
      "text": "Based on the current weather data:\n\n- **Paris**: 18°C and partly cloudy - quite pleasant!\n- **London**: 15°C and rainy - you'll want an umbrella.\n\nParis has slightly warmer and drier conditions today."
    },
    "model": "claude-3-sonnet-20240307",
    "stopReason": "endTurn"
  }
}

Message Content Constraints

Tool Result Messages

When a user message contains tool results (type: "tool_result"), it MUST contain ONLY tool results. Mixing tool results with other content types (text, image, audio) in the same message is not allowed.

This constraint ensures compatibility with provider APIs that use dedicated roles for tool results (e.g., OpenAI's "tool" role, Gemini's "function" role).

Valid - single tool result:

{
  "role": "user",
  "content": {
    "type": "tool_result",
    "toolUseId": "call_123",
    "content": [{ "type": "text", "text": "Result data" }]
  }
}

Valid - multiple tool results:

{
  "role": "user",
  "content": [
    {
      "type": "tool_result",
      "toolUseId": "call_123",
      "content": [{ "type": "text", "text": "Result 1" }]
    },
    {
      "type": "tool_result",
      "toolUseId": "call_456",
      "content": [{ "type": "text", "text": "Result 2" }]
    }
  ]
}

Invalid - mixed content:

{
  "role": "user",
  "content": [
    {
      "type": "text",
      "text": "Here are the results:"
    },
    {
      "type": "tool_result",
      "toolUseId": "call_123",
      "content": [{ "type": "text", "text": "Result data" }]
    }
  ]
}

Tool Use and Result Balance

When using tool use in sampling, every assistant message containing ToolUseContent blocks MUST be followed by a user message that consists entirely of ToolResultContent blocks, with each tool use (e.g. with id: $id) matched by a corresponding tool result (with toolUseId: $id), before any other message.

This requirement ensures:

Tool uses are always resolved before the conversation continues
Provider APIs can concurrently process multiple tool uses and fetch their results in parallel
The conversation maintains a consistent request-response pattern

Example valid sequence:

User message: "What's the weather like in Paris and London?"
Assistant message: ToolUseContent (id: "call_abc123", name: "get_weather", input: {city: "Paris"}) + ToolUseContent (id: "call_def456", name: "get_weather", input: {city: "London"})
User message: ToolResultContent (toolUseId: "call_abc123", content: "18°C, partly cloudy") + ToolResultContent (toolUseId: "call_def456", content: "15°C, rainy")
Assistant message: Text response comparing the weather in both cities

Invalid sequence - missing tool result:

User message: "What's the weather like in Paris and London?"
Assistant message: ToolUseContent (id: "call_abc123", name: "get_weather", input: {city: "Paris"}) + ToolUseContent (id: "call_def456", name: "get_weather", input: {city: "London"})
User message: ToolResultContent (toolUseId: "call_abc123", content: "18°C, partly cloudy") ← Missing result for "call_def456"
Assistant message: Text response (invalid - not all tool uses were resolved)

Cross-API Compatibility

The sampling specification is designed to work across multiple LLM provider APIs (Claude, OpenAI, Gemini, etc.). Key design decisions for compatibility:

Message Roles

MCP uses two roles: "user" and "assistant".

Tool use requests are sent in CreateMessageResult with the "assistant" role.

Tool results are sent back in messages with the "user" role.

Messages with tool results cannot contain other kinds of content.

Tool Choice Modes

CreateMessageRequest.params.toolChoice controls the tool use ability of the model:

{mode: "auto"}: Model decides whether to use tools (default)
{mode: "required"}: Model MUST use at least one tool before completing
{mode: "none"}: Model MUST NOT use any tools

Parallel Tool Use

MCP allows models to make multiple tool use requests in parallel (returning an array of ToolUseContent). All major provider APIs support this:

Claude: Supports parallel tool use natively
OpenAI: Supports parallel tool calls (can be disabled with parallel_tool_calls: false)
Gemini: Supports parallel function calls natively

Implementations wrapping providers that support disabling parallel tool use MAY expose this as an extension, but it is not part of the core MCP specification.

Message Flow

sequenceDiagram participant Server participant Client participant User participant LLM Note over Server,Client: Server initiates sampling Server->>Client: sampling/createMessage Note over Client,User: Human-in-the-loop review Client->>User: Present request for approval User-->>Client: Review and approve/modify Note over Client,LLM: Model interaction Client->>LLM: Forward approved request LLM-->>Client: Return generation Note over Client,User: Response review Client->>User: Present response for approval User-->>Client: Review and approve/modify Note over Server,Client: Complete request Client-->>Server: Return approved response

Data Types

Messages

Sampling messages can contain:

Text Content

{
  "type": "text",
  "text": "The message content"
}

Image Content

{
  "type": "image",
  "data": "base64-encoded-image-data",
  "mimeType": "image/jpeg"
}

Audio Content

{
  "type": "audio",
  "data": "base64-encoded-audio-data",
  "mimeType": "audio/wav"
}

Model Preferences

Model selection in MCP requires careful abstraction since servers and clients may use different AI providers with distinct model offerings. A server cannot simply request a specific model by name since the client may not have access to that exact model or may prefer to use a different provider's equivalent model.

To solve this, MCP implements a preference system that combines abstract capability priorities with optional model hints:

Capability Priorities

Servers express their needs through three normalized priority values (0-1):

costPriority: How important is minimizing costs? Higher values prefer cheaper models.
speedPriority: How important is low latency? Higher values prefer faster models.
intelligencePriority: How important are advanced capabilities? Higher values prefer more capable models.

Model Hints

While priorities help select models based on characteristics, hints allow servers to suggest specific models or model families:

Hints are treated as substrings that can match model names flexibly
Multiple hints are evaluated in order of preference
Clients MAY map hints to equivalent models from different providers
Hints are advisory—clients make final model selection

For example:

{
  "hints": [
    { "name": "claude-3-sonnet" }, // Prefer Sonnet-class models
    { "name": "claude" } // Fall back to any Claude model
  ],
  "costPriority": 0.3, // Cost is less important
  "speedPriority": 0.8, // Speed is very important
  "intelligencePriority": 0.5 // Moderate capability needs
}

The client processes these preferences to select an appropriate model from its available options. For instance, if the client doesn't have access to Claude models but has Gemini, it might map the sonnet hint to gemini-1.5-pro based on similar capabilities.

Error Handling

Clients SHOULD return errors for common failure cases:

User rejected sampling request: -1
Tool result missing in request: -32602 (Invalid params)
Tool results mixed with other content: -32602 (Invalid params)

Example errors:

{
  "jsonrpc": "2.0",
  "id": 3,
  "error": {
    "code": -1,
    "message": "User rejected sampling request"
  }
}

{
  "jsonrpc": "2.0",
  "id": 4,
  "error": {
    "code": -32602,
    "message": "Tool result missing in request"
  }
}

Security Considerations

Clients SHOULD implement user approval controls
Both parties SHOULD validate message content
Clients SHOULD respect model preference hints
Clients SHOULD implement rate limiting
Both parties MUST handle sensitive data appropriately

When tools are used in sampling, additional security considerations apply:

Servers MUST ensure that when replying to a stopReason: "toolUse", each ToolUseContent item is responded to with a ToolResultContent item with a matching toolUseId, and that the user message contains only tool results (no other content types)
Both parties SHOULD implement iteration limits for tool loops

Source: https://modelcontextprotocol.io/specification/2025-11-25/client/sampling.md