Understand Model Context Protocol (MCP) sampling
The Model Context Protocol (MCP) provides a standardized way for servers to request LLM sampling ("completions" or "generations") from language models via clients. This flow allows clients to maintain control over model access, selection, and permissions while enabling servers to leverage AI capabilities—with no server API keys necessary. Servers can request text, audio, or image-based interactions and optionally include context from MCP servers in their prompts.
User Interaction Model
Sampling in MCP allows servers to implement agentic behaviors, by enabling LLM calls to occur nested inside other MCP server features.
Implementations are free to expose sampling through any interface pattern that suits their needs—the protocol itself does not mandate any specific user interaction model.
For trust & safety and security, there SHOULD always be a human in the loop with the ability to deny sampling requests.
Applications SHOULD:
- Provide UI that makes it easy and intuitive to review sampling requests
- Allow users to view and edit prompts before sending
- Present generated responses for review before delivery
Tools in Sampling
Servers can request that the client's LLM use tools during sampling by providing
a tools array and optional toolChoice configuration in their sampling
requests. This enables servers to implement agentic behaviors where the LLM can call
tools, receive results, and continue the conversation - all within a single sampling
request flow.
Clients MUST declare support for tool use via the sampling.tools
capability to receive tool-enabled sampling requests. Servers MUST NOT
send tool-enabled sampling requests to Clients that have not declared support for
tool use via the sampling.tools capability.
Capabilities
Clients that support sampling MUST declare the sampling
capability during initialization:
Basic sampling:
{
"capabilities": {
"sampling": {}
}
}
With tool use support:
{
"capabilities": {
"sampling": {
"tools": {}
}
}
}
With context inclusion support (soft-deprecated):
{
"capabilities": {
"sampling": {
"context": {}
}
}
}
The includeContext parameter values "thisServer" and
"allServers" are soft-deprecated. Servers SHOULD avoid
using these values (e.g. can just omit includeContext since it defaults
to "none"), and SHOULD NOT use them unless the client
declares sampling.context capability. These values may be removed in
future spec releases.
Protocol Messages
Creating Messages
To request a language model generation, servers send a sampling/createMessage
request:
Request:
{
"jsonrpc": "2.0",
"id": 1,
"method": "sampling/createMessage",
"params": {
"messages": [
{
"role": "user",
"content": {
"type": "text",
"text": "What is the capital of France?"
}
}
],
"modelPreferences": {
"hints": [
{
"name": "claude-3-sonnet"
}
],
"intelligencePriority": 0.8,
"speedPriority": 0.5
},
"systemPrompt": "You are a helpful assistant.",
"maxTokens": 100
}
}
Response:
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"role": "assistant",
"content": {
"type": "text",
"text": "The capital of France is Paris."
},
"model": "claude-3-sonnet-20240307",
"stopReason": "endTurn"
}
}
Sampling with Tools
The following diagram illustrates the complete flow of sampling with tools, including the multi-turn tool loop:
(messages + tools) Note over Client,User: Human-in-the-loop review Client->User: Present request for approval User-->>Client: Approve/modify Client->LLM: Forward request with tools LLM-->>Client: Response with tool_use
(stopReason: "toolUse") Client->User: Present tool calls for review User-->>Client: Approve tool calls Client-->>Server: Return tool_use response Note over Server: Execute tool(s) Server->Server: Run get_weather("Paris")
Run get_weather("London") Note over Server,Client: Continue with tool results Server->Client: sampling/createMessage
(history + tool_results + tools) Client->User: Present continuation User-->>Client: Approve Client->LLM: Forward with tool results LLM-->>Client: Final text response
(stopReason: "endTurn") Client->User: Present response User-->>Client: Approve Client-->>Server: Return final response Note over Server: Server processes result
(may continue conversation...)
To request LLM generation with tool use capabilities, servers include tools
and optionally toolChoice in the request:
Request (Server -> Client):
{
"jsonrpc": "2.0",
"id": 1,
"method": "sampling/createMessage",
"params": {
"messages": [
{
"role": "user",
"content": {
"type": "text",
"text": "What's the weather like in Paris and London?"
}
}
],
"tools": [
{
"name": "get_weather",
"description": "Get current weather for a city",
"inputSchema": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name"
}
},
"required": [ "city" ]
}
}
],
"toolChoice": {
"mode": "auto"
},
"maxTokens": 1000
}
}
Response (Client -> Server):
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"role": "assistant",
"content": [
{
"type": "tool_use",
"id": "call_abc123",
"name": "get_weather",
"input": {
"city": "Paris"
}
},
{
"type": "tool_use",
"id": "call_def456",
"name": "get_weather",
"input": {
"city": "London"
}
}
],
"model": "claude-3-sonnet-20240307",
"stopReason": "toolUse"
}
}
Multi-turn Tool Loop
After receiving tool use requests from the LLM, the server typically:
- Executes the requested tool uses.
- Sends a new sampling request with the tool results appended
- Receives the LLM's response (which might contain new tool uses)
- Repeats as many times as needed (server might cap the maximum number of
iterations, and e.g. pass
toolChoice: {mode: "none"}on the last iteration to force a final result)
Follow-up request (Server -> Client) with tool results:
{
"jsonrpc": "2.0",
"id": 2,
"method": "sampling/createMessage",
"params": {
"messages": [
{
"role": "user",
"content": {
"type": "text",
"text": "What's the weather like in Paris and London?"
}
},
{
"role": "assistant",
"content": [
{
"type": "tool_use",
"id": "call_abc123",
"name": "get_weather",
"input": { "city": "Paris" }
},
{
"type": "tool_use",
"id": "call_def456",
"name": "get_weather",
"input": { "city": "London" }
}
]
},
{
"role": "user",
"content": [
{
"type": "tool_result",
"toolUseId": "call_abc123",
"content": [
{
"type": "text",
"text": "Weather in Paris: 18°C, partly cloudy"
}
]
},
{
"type": "tool_result",
"toolUseId": "call_def456",
"content": [
{
"type": "text",
"text": "Weather in London: 15°C, rainy"
}
]
}
]
}
],
"tools": [
{
"name": "get_weather",
"description": "Get current weather for a city",
"inputSchema": {
"type": "object",
"properties": {
"city": { "type": "string" }
},
"required": [ "city" ]
}
}
],
"maxTokens": 1000
}
}
Final response (Client -> Server):
{
"jsonrpc": "2.0",
"id": 2,
"result": {
"role": "assistant",
"content": {
"type": "text",
"text": "Based on the current weather data:\n\n- **Paris**: 18°C and partly cloudy - quite pleasant!\n- **London**: 15°C and rainy - you'll want an umbrella.\n\nParis has slightly warmer and drier conditions today."
},
"model": "claude-3-sonnet-20240307",
"stopReason": "endTurn"
}
}
Message Content Constraints
Tool Result Messages
When a user message contains tool results (type: "tool_result"), it MUST contain ONLY tool results. Mixing tool results with other content types (text, image, audio) in the same message is not allowed.
This constraint ensures compatibility with provider APIs that use dedicated roles for tool results (e.g., OpenAI's "tool" role, Gemini's "function" role).
Valid - single tool result:
{
"role": "user",
"content": {
"type": "tool_result",
"toolUseId": "call_123",
"content": [{ "type": "text", "text": "Result data" }]
}
}
Valid - multiple tool results:
{
"role": "user",
"content": [
{
"type": "tool_result",
"toolUseId": "call_123",
"content": [{ "type": "text", "text": "Result 1" }]
},
{
"type": "tool_result",
"toolUseId": "call_456",
"content": [{ "type": "text", "text": "Result 2" }]
}
]
}
Invalid - mixed content:
{
"role": "user",
"content": [
{
"type": "text",
"text": "Here are the results:"
},
{
"type": "tool_result",
"toolUseId": "call_123",
"content": [{ "type": "text", "text": "Result data" }]
}
]
}
Tool Use and Result Balance
When using tool use in sampling, every assistant message containing
ToolUseContent blocks MUST be followed by a user message
that consists entirely of ToolResultContent blocks, with each tool use
(e.g. with id: $id) matched by a corresponding tool result (with
toolUseId: $id), before any other message.
This requirement ensures:
- Tool uses are always resolved before the conversation continues
- Provider APIs can concurrently process multiple tool uses and fetch their results in parallel
- The conversation maintains a consistent request-response pattern
Example valid sequence:
- User message: "What's the weather like in Paris and London?"
- Assistant message:
ToolUseContent(id: "call_abc123", name: "get_weather", input: {city: "Paris"}) +ToolUseContent(id: "call_def456", name: "get_weather", input: {city: "London"}) - User message:
ToolResultContent(toolUseId: "call_abc123", content: "18°C, partly cloudy") +ToolResultContent(toolUseId: "call_def456", content: "15°C, rainy") - Assistant message: Text response comparing the weather in both cities
Invalid sequence - missing tool result:
- User message: "What's the weather like in Paris and London?"
- Assistant message:
ToolUseContent(id: "call_abc123", name: "get_weather", input: {city: "Paris"}) +ToolUseContent(id: "call_def456", name: "get_weather", input: {city: "London"}) - User message:
ToolResultContent(toolUseId: "call_abc123", content: "18°C, partly cloudy") ← Missing result for "call_def456" - Assistant message: Text response (invalid - not all tool uses were resolved)
Cross-API Compatibility
The sampling specification is designed to work across multiple LLM provider APIs (Claude, OpenAI, Gemini, etc.). Key design decisions for compatibility:
Message Roles
MCP uses two roles: "user" and "assistant".
Tool use requests are sent in CreateMessageResult with the "assistant"
role.
Tool results are sent back in messages with the "user" role.
Messages with tool results cannot contain other kinds of content.
Tool Choice Modes
CreateMessageRequest.params.toolChoice controls the tool use ability
of the model:
{mode: "auto"}: Model decides whether to use tools (default){mode: "required"}: Model MUST use at least one tool before completing{mode: "none"}: Model MUST NOT use any tools
Parallel Tool Use
MCP allows models to make multiple tool use requests in parallel (returning an
array of ToolUseContent). All major provider APIs support this:
- Claude: Supports parallel tool use natively
- OpenAI: Supports parallel tool calls (can be disabled
with
parallel_tool_calls: false) - Gemini: Supports parallel function calls natively
Implementations wrapping providers that support disabling parallel tool use MAY expose this as an extension, but it is not part of the core MCP specification.
Message Flow
Data Types
Messages
Sampling messages can contain:
Text Content
{
"type": "text",
"text": "The message content"
}
Image Content
{
"type": "image",
"data": "base64-encoded-image-data",
"mimeType": "image/jpeg"
}
Audio Content
{
"type": "audio",
"data": "base64-encoded-audio-data",
"mimeType": "audio/wav"
}
Model Preferences
Model selection in MCP requires careful abstraction since servers and clients may use different AI providers with distinct model offerings. A server cannot simply request a specific model by name since the client may not have access to that exact model or may prefer to use a different provider's equivalent model.
To solve this, MCP implements a preference system that combines abstract capability priorities with optional model hints:
Capability Priorities
Servers express their needs through three normalized priority values (0-1):
costPriority: How important is minimizing costs? Higher values prefer cheaper models.speedPriority: How important is low latency? Higher values prefer faster models.intelligencePriority: How important are advanced capabilities? Higher values prefer more capable models.
Model Hints
While priorities help select models based on characteristics, hints
allow servers to suggest specific models or model families:
- Hints are treated as substrings that can match model names flexibly
- Multiple hints are evaluated in order of preference
- Clients MAY map hints to equivalent models from different providers
- Hints are advisory—clients make final model selection
For example:
{
"hints": [
{ "name": "claude-3-sonnet" }, // Prefer Sonnet-class models
{ "name": "claude" } // Fall back to any Claude model
],
"costPriority": 0.3, // Cost is less important
"speedPriority": 0.8, // Speed is very important
"intelligencePriority": 0.5 // Moderate capability needs
}
The client processes these preferences to select an appropriate model from its
available options. For instance, if the client doesn't have access to Claude models
but has Gemini, it might map the sonnet hint to gemini-1.5-pro based on
similar capabilities.
Error Handling
Clients SHOULD return errors for common failure cases:
- User rejected sampling request:
-1 - Tool result missing in request:
-32602(Invalid params) - Tool results mixed with other content:
-32602(Invalid params)
Example errors:
{
"jsonrpc": "2.0",
"id": 3,
"error": {
"code": -1,
"message": "User rejected sampling request"
}
}
{
"jsonrpc": "2.0",
"id": 4,
"error": {
"code": -32602,
"message": "Tool result missing in request"
}
}
Security Considerations
- Clients SHOULD implement user approval controls
- Both parties SHOULD validate message content
- Clients SHOULD respect model preference hints
- Clients SHOULD implement rate limiting
- Both parties MUST handle sensitive data appropriately
When tools are used in sampling, additional security considerations apply:
- Servers MUST ensure that when replying to a
stopReason: "toolUse", eachToolUseContentitem is responded to with aToolResultContentitem with a matchingtoolUseId, and that the user message contains only tool results (no other content types) - Both parties SHOULD implement iteration limits for tool loops
Source: https://modelcontextprotocol.io/specification/2025-11-25/client/sampling.md