Model Context Protocol (MCP) is an open standard (by Anthropic, 2024) that defines how AI models connect to external data sources and tools. MCP standardises the interface between a host (the AI application / agent) and a server (a process exposing data or tools). Communication uses JSON-RPC 2.0 over stdio (local) or HTTP+SSE (remote). The four core primitives are Resources (data), Tools (actions), Prompts (templates), and Sampling (LLM call passthrough). For data engineers, MCP is the standard way to expose database query interfaces, pipeline status APIs, and data catalogs to any MCP-compatible AI client (Claude Desktop, VS Code Copilot, ADK, LangChain, etc.).
JSON-RPC 2.0 stdio / HTTP+SSE Resources Tools Prompts Sampling
MCP has three roles:
One host can connect to many servers simultaneously. Each server is isolated and exposes only the primitives it supports.
| Transport | How | Best For |
|---|---|---|
| stdio | Host spawns server as subprocess; JSON-RPC over stdin/stdout | Local tools, CLI integrations, development |
| HTTP + SSE | Server runs as HTTP endpoint; client POSTs requests, receives streamed responses via Server-Sent Events | Remote servers, cloud deployments, multi-client |
| Streamable HTTP (2025) | Replaces SSE; bidirectional streaming over a single HTTP connection | Production remote servers |
Resources expose data to the model (read-only context). Each resource has a URI scheme (e.g. db://orders/schema, file:///reports/q3.pdf). The model can read a resource to include its content in context. Resources support subscriptions — the server notifies the client when a resource changes.
Tools are callable functions the model can invoke to take actions or retrieve dynamic data. Each tool has a name, description, and a JSON Schema inputSchema. The host presents available tools to the LLM in its system prompt; the model emits a tool call JSON; the client routes it to the correct server; the server executes and returns a result. Tools are the workhorse primitive — most MCP integrations are tool-heavy.
Prompt templates with named arguments, stored on the server. Allow server owners to define standardised, versioned prompts for common tasks (e.g. "analyse_table_quality", "explain_dag_failure"). Clients can list and retrieve prompts; some hosts surface them as slash commands.
An advanced primitive: the server asks the client's host to make an LLM call on its behalf (e.g. for classification inside a tool). This allows servers to leverage the host's LLM without needing their own API key — useful for embedding LLM reasoning into server-side tool logic.
↑ Back to topAn MCP server exposes BigQuery/Snowflake/DuckDB via: Resources for table schemas and sample data, Tools for running read-only SQL, and Prompts for common analysis patterns. Any MCP-compatible AI client (Claude Desktop, ADK agent, custom app) can query the warehouse without custom integration code per client.
An MCP server wraps the Airflow REST API and dbt Cloud API. Tools: list_dag_runs, get_task_logs, trigger_dag, get_dbt_run_status. Data engineers interact via Claude Desktop to check pipeline health, read error logs, and trigger reruns — without opening multiple dashboards.
An MCP server indexes the company data catalog (Datahub, OpenMetadata, or dbt docs). Resources expose lineage graphs; Tools answer "which tables depend on X", "who owns table Y", "what dbt tests cover Z". Reduces time-to-answer for data producer/consumer questions.
In a complex agent system, an orchestrating model (Claude) uses a Sampling-capable MCP server to delegate specific sub-tasks to specialised models (e.g. code generation to GPT-4o, classification to a fine-tuned local model) — without the client application needing direct access to each model's API.
↑ Back to top# pip install mcp
from mcp.server import FastMCP
import duckdb
mcp = FastMCP("Data Warehouse Server")
DB_PATH = "warehouse.ddb"
@mcp.tool()
def list_tables() -> list[str]:
"""List all tables in the warehouse."""
conn = duckdb.connect(DB_PATH, read_only=True)
return [r[0] for r in conn.execute("SHOW TABLES").fetchall()]
@mcp.tool()
def query_table(sql: str) -> str:
"""Execute a read-only SQL query and return results as a CSV string.
Args:
sql: A SELECT statement to execute.
"""
if any(kw in sql.upper() for kw in ["DROP", "DELETE", "UPDATE", "INSERT"]):
raise ValueError("Only SELECT statements are allowed")
conn = duckdb.connect(DB_PATH, read_only=True)
df = conn.execute(sql).fetchdf()
return df.to_csv(index=False, max_rows=100)
@mcp.tool()
def describe_table(table_name: str) -> str:
"""Return the schema of a table.
Args:
table_name: The name of the table to describe.
"""
conn = duckdb.connect(DB_PATH, read_only=True)
return conn.execute(f"DESCRIBE {table_name}").fetchdf().to_string()
if __name__ == "__main__":
mcp.run(transport="stdio") # or transport="sse" for HTTP
from mcp.server import FastMCP
from mcp.types import Resource
import json, duckdb
mcp = FastMCP("Schema Registry")
DB_PATH = "warehouse.ddb"
@mcp.resource("db://schemas")
def get_all_schemas() -> str:
"""Full data catalog: all tables with column names and types."""
conn = duckdb.connect(DB_PATH, read_only=True)
tables = [r[0] for r in conn.execute("SHOW TABLES").fetchall()]
catalog = {}
for tbl in tables:
schema_df = conn.execute(f"DESCRIBE {tbl}").fetchdf()
catalog[tbl] = schema_df[["column_name", "column_type"]].to_dict("records")
return json.dumps(catalog, indent=2)
@mcp.resource("db://table/{table_name}/sample")
def get_table_sample(table_name: str) -> str:
"""Return 5 sample rows from a table."""
conn = duckdb.connect(DB_PATH, read_only=True)
return conn.execute(f"SELECT * FROM {table_name} LIMIT 5").fetchdf().to_csv(index=False)
if __name__ == "__main__":
mcp.run(transport="stdio")
from langchain_mcp_adapters.client import MultiServerMCPClient
from langchain.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain import hub
import asyncio
async def main():
async with MultiServerMCPClient({
"warehouse": {
"command": "python",
"args": ["warehouse_mcp_server.py"],
"transport": "stdio",
},
"airflow": {
"url": "http://localhost:8000/sse",
"transport": "sse",
},
}) as client:
tools = await client.get_tools() # all tools from all servers
llm = ChatOpenAI(model="gpt-4o")
prompt = hub.pull("hwchase17/react")
agent = create_react_agent(llm=llm, tools=tools, prompt=prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
result = await executor.ainvoke({
"input": "Check all failed DAGs today and query the error counts by table"
})
print(result["output"])
asyncio.run(main())
// ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"warehouse": {
"command": "python",
"args": ["/path/to/warehouse_mcp_server.py"],
"env": {
"DB_PATH": "/data/warehouse.ddb"
}
},
"airflow": {
"command": "python",
"args": ["/path/to/airflow_mcp_server.py"],
"env": {
"AIRFLOW_URL": "http://airflow.internal:8080",
"AIRFLOW_TOKEN": "your-api-token"
}
}
}
}
↑ Back to top
| Approach | Standardised? | Multi-Client? | Transport | Discoverability | Best For |
|---|---|---|---|---|---|
| MCP | Yes (open spec) | Yes — any MCP host | stdio / HTTP+SSE | Native (tools/resources list) | Reusable tool servers for multiple AI clients |
| LangChain Tools | No (framework-specific) | LangChain only | In-process | Code definition | Tools within a single LangChain app |
| OpenAI Function Calling | No | OpenAI models only | REST | JSON Schema in prompt | Direct OpenAI API integrations |
| REST API | No (custom per API) | Any HTTP client | HTTP | OpenAPI spec (optional) | General service interoperability |
| Plugin (deprecated) | OpenAI-specific | ChatGPT only | HTTP | OpenAPI manifest | Legacy ChatGPT plugins |
asyncio) for server implementations and enforce query timeouts.list_tables, describe_table, query_table) and a resource (db://schemas). Connect it to Claude Desktop via claude_desktop_config.json and ask 5 natural-language questions about the warehouse.analyse_table_quality, compare_row_counts_over_time, and explain_table_for_stakeholder. Each should accept relevant parameters. Surface them in Claude Desktop and verify they render correctly.langchain-mcp-adapters to connect a LangChain ReAct agent to two separate MCP servers simultaneously (warehouse + a mock Airflow server). Run a test query that requires data from both servers to answer.