Why MCP Matters
The Model Context Protocol was introduced by Anthropic in late 2024 and has since become the closest thing to a standard for connecting LLM applications to tools, data sources, and external systems. In 2026 it's supported by Claude Desktop, Claude Code, Cursor, Zed, OpenAI's recent clients, and a long tail of agents. If you maintain an internal API that AI assistants should be able to call safely — a ticketing system, a runbook store, a metrics backend — exposing it through MCP is the lowest-friction path.
This post is the production-grade version of an MCP server. Not a hello world. We cover authentication, authorization, rate limiting, observability, and deployment. The examples use the official TypeScript SDK.
The Five Concepts You Need
MCP defines a small vocabulary. Learn these five and you understand 90% of the spec.
- Server. A process that exposes capabilities.
- Client. An LLM application that connects to servers.
- Tools. Callable functions with typed arguments the model can invoke.
- Resources. Read-only data sources (files, records, pages) the model can include in context.
- Prompts. Parameterized prompt templates the server exposes to the client.
Most production MCP servers are really about tools and resources. Prompts are useful but secondary.
Transport: stdio Or HTTP
MCP supports multiple transports. For local development, stdio is the obvious choice — the client spawns the server as a subprocess and they talk over stdin/stdout. For production, you want HTTP with Server-Sent Events (SSE), which is the network-friendly transport the spec standardized in 2025. It supports multiple clients, TLS, auth headers, and load balancers.
If your server exists to be consumed by agents running on engineers' laptops, ship both transports. If it exists to be consumed by a production agent platform, ship only HTTP.
Project Layout
mcp-ticketing/
├── src/
│ ├── index.ts # entrypoint
│ ├── server.ts # server setup
│ ├── auth.ts # auth middleware
│ ├── tools/
│ │ ├── search.ts
│ │ ├── create_ticket.ts
│ │ └── close_ticket.ts
│ ├── resources/
│ │ └── ticket.ts
│ ├── ratelimit.ts
│ └── telemetry.ts
├── Dockerfile
├── docker-compose.yaml
├── k8s/
│ ├── deployment.yaml
│ ├── service.yaml
│ └── ingress.yaml
├── package.json
└── tsconfig.json
Minimum Server With The Official SDK
The official @modelcontextprotocol/sdk package provides the protocol plumbing. You implement tools and resources; the SDK handles the rest.
// src/server.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { z } from "zod";
import { searchTickets, createTicket, closeTicket } from "./tools/index.js";
import { readTicketResource } from "./resources/ticket.js";
export function buildServer(): McpServer {
const server = new McpServer(
{
name: "ticketing",
version: "1.2.0",
},
{
capabilities: {
tools: {},
resources: { subscribe: false, listChanged: true },
},
},
);
server.registerTool(
"search_tickets",
{
title: "Search tickets",
description: "Search the ticketing system by free text and optional status",
inputSchema: {
query: z.string().min(1).max(500),
status: z.enum(["open", "closed", "pending"]).optional(),
limit: z.number().int().min(1).max(50).default(10),
},
},
async (args, extra) => {
const results = await searchTickets(args, extra.sessionId);
return {
content: [
{ type: "text", text: JSON.stringify(results, null, 2) },
],
};
},
);
server.registerTool(
"create_ticket",
{
title: "Create ticket",
description: "Create a new ticket. Requires write permission.",
inputSchema: {
title: z.string().min(1).max(200),
body: z.string().min(1).max(10_000),
priority: z.enum(["low", "medium", "high"]).default("medium"),
},
},
async (args, extra) => {
const ticket = await createTicket(args, extra.sessionId);
return {
content: [{ type: "text", text: `Created ticket ${ticket.id}` }],
};
},
);
server.registerTool(
"close_ticket",
{
title: "Close ticket",
description: "Close an existing ticket by ID.",
inputSchema: {
id: z.string().regex(/^T-\d+$/),
resolution: z.string().max(1000),
},
},
async (args, extra) => {
await closeTicket(args, extra.sessionId);
return { content: [{ type: "text", text: `Closed ${args.id}` }] };
},
);
server.registerResource(
"ticket",
"ticket://{id}",
{
title: "Ticket",
description: "Read a ticket by ID",
mimeType: "application/json",
},
async (uri, { id }) => {
const ticket = await readTicketResource(String(id));
return {
contents: [
{
uri: uri.href,
mimeType: "application/json",
text: JSON.stringify(ticket),
},
],
};
},
);
return server;
}
Note the extra.sessionId — the SDK exposes a per-request context you can use to carry authenticated user identity into your tool handlers.
Authentication: OAuth, Not API Keys
The MCP spec recommends OAuth 2.1 for HTTP transports. In practice you have three choices:
- OAuth 2.1 with PKCE — correct, flexible, most work.
- OIDC from your existing IdP — good if your clients already have user identity.
- Short-lived bearer tokens minted by a trusted control plane — easiest for machine-to-machine.
For a tool that represents authenticated user actions (creating tickets on someone's behalf), OAuth with the user's consent is the right model. For a tool that's scoped to a single service identity, signed bearer tokens are fine.
A minimal JWT verification middleware using Hono:
// src/auth.ts
import { jwtVerify, createRemoteJWKSet } from "jose";
import type { Context, Next } from "hono";
const JWKS = createRemoteJWKSet(new URL(process.env.OIDC_JWKS_URL!));
export async function requireAuth(c: Context, next: Next) {
const authHeader = c.req.header("authorization");
if (!authHeader?.startsWith("Bearer ")) {
return c.json({ error: "missing bearer token" }, 401);
}
try {
const { payload } = await jwtVerify(authHeader.slice(7), JWKS, {
issuer: process.env.OIDC_ISSUER,
audience: "mcp-ticketing",
});
c.set("principal", {
sub: payload.sub as string,
scopes: (payload.scope as string | undefined)?.split(" ") ?? [],
});
} catch {
return c.json({ error: "invalid token" }, 401);
}
await next();
}
export function requireScope(scope: string) {
return async (c: Context, next: Next) => {
const principal = c.get("principal") as { scopes: string[] } | undefined;
if (!principal?.scopes.includes(scope)) {
return c.json({ error: `missing scope: ${scope}` }, 403);
}
await next();
};
}
In your tool handler, check the scope before doing the thing:
// src/tools/create_ticket.ts
import { principalForSession } from "../session.js";
export async function createTicket(
args: { title: string; body: string; priority: string },
sessionId: string,
) {
const principal = principalForSession(sessionId);
if (!principal.scopes.includes("tickets:write")) {
throw new Error("forbidden: tickets:write required");
}
// ...call the real backend
return { id: "T-12345" };
}
Rate Limiting Per Client
You don't want one client to drain your backend. Per-client rate limits live in Redis.
// src/ratelimit.ts
import { RateLimiterRedis } from "rate-limiter-flexible";
import Redis from "ioredis";
const redis = new Redis(process.env.REDIS_URL!);
const limiters = {
search: new RateLimiterRedis({
storeClient: redis,
keyPrefix: "mcp:search",
points: 120,
duration: 60,
}),
write: new RateLimiterRedis({
storeClient: redis,
keyPrefix: "mcp:write",
points: 20,
duration: 60,
}),
};
export async function enforce(kind: "search" | "write", clientId: string) {
await limiters[kind].consume(clientId);
}
Apply them at the top of each tool handler. 429s are propagated back to the MCP client via the SDK.
Observability With OpenTelemetry
Every tool call should produce a trace. The MCP SDK doesn't instrument for you — you add spans inside your handlers.
// src/telemetry.ts
import { NodeSDK } from "@opentelemetry/sdk-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";
import { resourceFromAttributes } from "@opentelemetry/resources";
import { ATTR_SERVICE_NAME } from "@opentelemetry/semantic-conventions";
import { trace, SpanStatusCode } from "@opentelemetry/api";
const sdk = new NodeSDK({
resource: resourceFromAttributes({
[ATTR_SERVICE_NAME]: "mcp-ticketing",
}),
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT,
}),
});
sdk.start();
const tracer = trace.getTracer("mcp-ticketing");
export async function withSpan<T>(
name: string,
attrs: Record<string, string | number>,
fn: () => Promise<T>,
): Promise<T> {
return tracer.startActiveSpan(name, { attributes: attrs }, async (span) => {
try {
const result = await fn();
span.setStatus({ code: SpanStatusCode.OK });
return result;
} catch (err) {
span.setStatus({ code: SpanStatusCode.ERROR, message: String(err) });
span.recordException(err as Error);
throw err;
} finally {
span.end();
}
});
}
Wrap every tool call:
async (args, extra) => {
return withSpan(
"tool.search_tickets",
{ "mcp.client": extra.sessionId, "query.len": args.query.length },
async () => {
const results = await searchTickets(args, extra.sessionId);
return {
content: [{ type: "text", text: JSON.stringify(results) }],
};
},
);
},
Docker And Local Dev
FROM node:20-alpine AS build
WORKDIR /app
COPY package.json pnpm-lock.yaml ./
RUN corepack enable && pnpm install --frozen-lockfile
COPY . .
RUN pnpm run build
FROM node:20-alpine
WORKDIR /app
COPY --from=build /app/node_modules ./node_modules
COPY --from=build /app/dist ./dist
COPY --from=build /app/package.json ./package.json
USER node
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=3s CMD node dist/healthcheck.js || exit 1
CMD ["node", "dist/index.js"]
And a compose file with Redis and a fake OIDC issuer for local dev:
services:
mcp-ticketing:
build: .
ports:
- "8080:8080"
environment:
OIDC_ISSUER: http://oidc-mock:4444/
OIDC_JWKS_URL: http://oidc-mock:4444/.well-known/jwks.json
REDIS_URL: redis://redis:6379
OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4318
depends_on: [redis, oidc-mock, otel-collector]
redis:
image: redis:7-alpine
oidc-mock:
image: ghcr.io/navikt/mock-oauth2-server:2.1.10
ports:
- "4444:4444"
otel-collector:
image: otel/opentelemetry-collector-contrib:0.112.0
command: ["--config=/etc/otel-collector-config.yaml"]
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: mcp-ticketing
namespace: ai-platform
spec:
replicas: 3
selector:
matchLabels: { app: mcp-ticketing }
template:
metadata:
labels: { app: mcp-ticketing }
spec:
containers:
- name: server
image: ghcr.io/example/mcp-ticketing:1.2.0
ports:
- containerPort: 8080
env:
- name: OIDC_ISSUER
value: https://auth.example.com/
- name: OIDC_JWKS_URL
value: https://auth.example.com/.well-known/jwks.json
- name: REDIS_URL
valueFrom: { secretKeyRef: { name: mcp-redis, key: url } }
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: http://otel-collector.observability:4318
readinessProbe:
httpGet: { path: /ready, port: 8080 }
periodSeconds: 5
livenessProbe:
httpGet: { path: /live, port: 8080 }
periodSeconds: 15
resources:
requests: { cpu: 100m, memory: 128Mi }
limits: { memory: 256Mi }
---
apiVersion: v1
kind: Service
metadata:
name: mcp-ticketing
namespace: ai-platform
spec:
selector: { app: mcp-ticketing }
ports:
- port: 80
targetPort: 8080
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: mcp-ticketing
namespace: ai-platform
annotations:
cert-manager.io/cluster-issuer: letsencrypt
spec:
tls:
- hosts: [mcp-ticketing.ai.example.com]
secretName: mcp-ticketing-tls
rules:
- host: mcp-ticketing.ai.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: mcp-ticketing
port: { number: 80 }
What To Watch In Production
- p95 tool call latency. Your backend is probably the bottleneck, not the MCP layer.
- Tool call error rate per tool. A spike in one tool usually means a new model is calling it wrong.
- Rate limit rejections per client. Indicates either abuse or an agent in a retry loop.
- Auth failures. Expired tokens, missing scopes — tells you clients need guidance.
- Schema validation errors. LLMs sometimes invent arguments. Track the frequency.
Common Mistakes
- Exposing too much. Each tool multiplies the attack surface. Start with read-only.
- No authorization inside handlers. Transport auth is not enough — check scope per action.
- Returning giant blobs. Models have finite context. Paginate, summarize, filter.
- Forgetting idempotency. Write tools may be called twice on retry. Use idempotency keys.
- No audit log. Every write action should be logged with principal, arguments, and outcome.
Next Steps
MCP is where tool integration with LLMs is going, and investing a week to expose your internal APIs through a production-grade MCP server pays back many times over when agents — yours and your customers' — can safely use them. Start with a single read-only tool, get auth and observability right, and expand from there. If you want help designing MCP servers for your platform or auditing an existing one, get in touch.