Loading...
All Articles
AI Infrastructure · 9 min read

MCP Server Implementation Guide: Model Context Protocol for Production

Build a production-grade Model Context Protocol server in TypeScript with authentication, rate limiting, observability, and Kubernetes deployment.

Why MCP Matters

The Model Context Protocol was introduced by Anthropic in late 2024 and has since become the closest thing to a standard for connecting LLM applications to tools, data sources, and external systems. In 2026 it's supported by Claude Desktop, Claude Code, Cursor, Zed, OpenAI's recent clients, and a long tail of agents. If you maintain an internal API that AI assistants should be able to call safely — a ticketing system, a runbook store, a metrics backend — exposing it through MCP is the lowest-friction path.

This post is the production-grade version of an MCP server. Not a hello world. We cover authentication, authorization, rate limiting, observability, and deployment. The examples use the official TypeScript SDK.

The Five Concepts You Need

MCP defines a small vocabulary. Learn these five and you understand 90% of the spec.

  1. Server. A process that exposes capabilities.
  2. Client. An LLM application that connects to servers.
  3. Tools. Callable functions with typed arguments the model can invoke.
  4. Resources. Read-only data sources (files, records, pages) the model can include in context.
  5. Prompts. Parameterized prompt templates the server exposes to the client.

Most production MCP servers are really about tools and resources. Prompts are useful but secondary.

Transport: stdio Or HTTP

MCP supports multiple transports. For local development, stdio is the obvious choice — the client spawns the server as a subprocess and they talk over stdin/stdout. For production, you want HTTP with Server-Sent Events (SSE), which is the network-friendly transport the spec standardized in 2025. It supports multiple clients, TLS, auth headers, and load balancers.

If your server exists to be consumed by agents running on engineers' laptops, ship both transports. If it exists to be consumed by a production agent platform, ship only HTTP.

Project Layout

mcp-ticketing/
├── src/
│   ├── index.ts          # entrypoint
│   ├── server.ts         # server setup
│   ├── auth.ts           # auth middleware
│   ├── tools/
│   │   ├── search.ts
│   │   ├── create_ticket.ts
│   │   └── close_ticket.ts
│   ├── resources/
│   │   └── ticket.ts
│   ├── ratelimit.ts
│   └── telemetry.ts
├── Dockerfile
├── docker-compose.yaml
├── k8s/
│   ├── deployment.yaml
│   ├── service.yaml
│   └── ingress.yaml
├── package.json
└── tsconfig.json

Minimum Server With The Official SDK

The official @modelcontextprotocol/sdk package provides the protocol plumbing. You implement tools and resources; the SDK handles the rest.

// src/server.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { z } from "zod";
import { searchTickets, createTicket, closeTicket } from "./tools/index.js";
import { readTicketResource } from "./resources/ticket.js";

export function buildServer(): McpServer {
  const server = new McpServer(
    {
      name: "ticketing",
      version: "1.2.0",
    },
    {
      capabilities: {
        tools: {},
        resources: { subscribe: false, listChanged: true },
      },
    },
  );

  server.registerTool(
    "search_tickets",
    {
      title: "Search tickets",
      description: "Search the ticketing system by free text and optional status",
      inputSchema: {
        query: z.string().min(1).max(500),
        status: z.enum(["open", "closed", "pending"]).optional(),
        limit: z.number().int().min(1).max(50).default(10),
      },
    },
    async (args, extra) => {
      const results = await searchTickets(args, extra.sessionId);
      return {
        content: [
          { type: "text", text: JSON.stringify(results, null, 2) },
        ],
      };
    },
  );

  server.registerTool(
    "create_ticket",
    {
      title: "Create ticket",
      description: "Create a new ticket. Requires write permission.",
      inputSchema: {
        title: z.string().min(1).max(200),
        body: z.string().min(1).max(10_000),
        priority: z.enum(["low", "medium", "high"]).default("medium"),
      },
    },
    async (args, extra) => {
      const ticket = await createTicket(args, extra.sessionId);
      return {
        content: [{ type: "text", text: `Created ticket ${ticket.id}` }],
      };
    },
  );

  server.registerTool(
    "close_ticket",
    {
      title: "Close ticket",
      description: "Close an existing ticket by ID.",
      inputSchema: {
        id: z.string().regex(/^T-\d+$/),
        resolution: z.string().max(1000),
      },
    },
    async (args, extra) => {
      await closeTicket(args, extra.sessionId);
      return { content: [{ type: "text", text: `Closed ${args.id}` }] };
    },
  );

  server.registerResource(
    "ticket",
    "ticket://{id}",
    {
      title: "Ticket",
      description: "Read a ticket by ID",
      mimeType: "application/json",
    },
    async (uri, { id }) => {
      const ticket = await readTicketResource(String(id));
      return {
        contents: [
          {
            uri: uri.href,
            mimeType: "application/json",
            text: JSON.stringify(ticket),
          },
        ],
      };
    },
  );

  return server;
}

Note the extra.sessionId — the SDK exposes a per-request context you can use to carry authenticated user identity into your tool handlers.

Authentication: OAuth, Not API Keys

The MCP spec recommends OAuth 2.1 for HTTP transports. In practice you have three choices:

  1. OAuth 2.1 with PKCE — correct, flexible, most work.
  2. OIDC from your existing IdP — good if your clients already have user identity.
  3. Short-lived bearer tokens minted by a trusted control plane — easiest for machine-to-machine.

For a tool that represents authenticated user actions (creating tickets on someone's behalf), OAuth with the user's consent is the right model. For a tool that's scoped to a single service identity, signed bearer tokens are fine.

A minimal JWT verification middleware using Hono:

// src/auth.ts
import { jwtVerify, createRemoteJWKSet } from "jose";
import type { Context, Next } from "hono";

const JWKS = createRemoteJWKSet(new URL(process.env.OIDC_JWKS_URL!));

export async function requireAuth(c: Context, next: Next) {
  const authHeader = c.req.header("authorization");
  if (!authHeader?.startsWith("Bearer ")) {
    return c.json({ error: "missing bearer token" }, 401);
  }

  try {
    const { payload } = await jwtVerify(authHeader.slice(7), JWKS, {
      issuer: process.env.OIDC_ISSUER,
      audience: "mcp-ticketing",
    });

    c.set("principal", {
      sub: payload.sub as string,
      scopes: (payload.scope as string | undefined)?.split(" ") ?? [],
    });
  } catch {
    return c.json({ error: "invalid token" }, 401);
  }

  await next();
}

export function requireScope(scope: string) {
  return async (c: Context, next: Next) => {
    const principal = c.get("principal") as { scopes: string[] } | undefined;
    if (!principal?.scopes.includes(scope)) {
      return c.json({ error: `missing scope: ${scope}` }, 403);
    }
    await next();
  };
}

In your tool handler, check the scope before doing the thing:

// src/tools/create_ticket.ts
import { principalForSession } from "../session.js";

export async function createTicket(
  args: { title: string; body: string; priority: string },
  sessionId: string,
) {
  const principal = principalForSession(sessionId);
  if (!principal.scopes.includes("tickets:write")) {
    throw new Error("forbidden: tickets:write required");
  }
  // ...call the real backend
  return { id: "T-12345" };
}

Rate Limiting Per Client

You don't want one client to drain your backend. Per-client rate limits live in Redis.

// src/ratelimit.ts
import { RateLimiterRedis } from "rate-limiter-flexible";
import Redis from "ioredis";

const redis = new Redis(process.env.REDIS_URL!);

const limiters = {
  search: new RateLimiterRedis({
    storeClient: redis,
    keyPrefix: "mcp:search",
    points: 120,
    duration: 60,
  }),
  write: new RateLimiterRedis({
    storeClient: redis,
    keyPrefix: "mcp:write",
    points: 20,
    duration: 60,
  }),
};

export async function enforce(kind: "search" | "write", clientId: string) {
  await limiters[kind].consume(clientId);
}

Apply them at the top of each tool handler. 429s are propagated back to the MCP client via the SDK.

Observability With OpenTelemetry

Every tool call should produce a trace. The MCP SDK doesn't instrument for you — you add spans inside your handlers.

// src/telemetry.ts
import { NodeSDK } from "@opentelemetry/sdk-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";
import { resourceFromAttributes } from "@opentelemetry/resources";
import { ATTR_SERVICE_NAME } from "@opentelemetry/semantic-conventions";
import { trace, SpanStatusCode } from "@opentelemetry/api";

const sdk = new NodeSDK({
  resource: resourceFromAttributes({
    [ATTR_SERVICE_NAME]: "mcp-ticketing",
  }),
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT,
  }),
});

sdk.start();

const tracer = trace.getTracer("mcp-ticketing");

export async function withSpan<T>(
  name: string,
  attrs: Record<string, string | number>,
  fn: () => Promise<T>,
): Promise<T> {
  return tracer.startActiveSpan(name, { attributes: attrs }, async (span) => {
    try {
      const result = await fn();
      span.setStatus({ code: SpanStatusCode.OK });
      return result;
    } catch (err) {
      span.setStatus({ code: SpanStatusCode.ERROR, message: String(err) });
      span.recordException(err as Error);
      throw err;
    } finally {
      span.end();
    }
  });
}

Wrap every tool call:

async (args, extra) => {
  return withSpan(
    "tool.search_tickets",
    { "mcp.client": extra.sessionId, "query.len": args.query.length },
    async () => {
      const results = await searchTickets(args, extra.sessionId);
      return {
        content: [{ type: "text", text: JSON.stringify(results) }],
      };
    },
  );
},

Docker And Local Dev

FROM node:20-alpine AS build
WORKDIR /app
COPY package.json pnpm-lock.yaml ./
RUN corepack enable && pnpm install --frozen-lockfile
COPY . .
RUN pnpm run build

FROM node:20-alpine
WORKDIR /app
COPY --from=build /app/node_modules ./node_modules
COPY --from=build /app/dist ./dist
COPY --from=build /app/package.json ./package.json
USER node
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=3s CMD node dist/healthcheck.js || exit 1
CMD ["node", "dist/index.js"]

And a compose file with Redis and a fake OIDC issuer for local dev:

services:
  mcp-ticketing:
    build: .
    ports:
      - "8080:8080"
    environment:
      OIDC_ISSUER: http://oidc-mock:4444/
      OIDC_JWKS_URL: http://oidc-mock:4444/.well-known/jwks.json
      REDIS_URL: redis://redis:6379
      OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4318
    depends_on: [redis, oidc-mock, otel-collector]

  redis:
    image: redis:7-alpine

  oidc-mock:
    image: ghcr.io/navikt/mock-oauth2-server:2.1.10
    ports:
      - "4444:4444"

  otel-collector:
    image: otel/opentelemetry-collector-contrib:0.112.0
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-ticketing
  namespace: ai-platform
spec:
  replicas: 3
  selector:
    matchLabels: { app: mcp-ticketing }
  template:
    metadata:
      labels: { app: mcp-ticketing }
    spec:
      containers:
        - name: server
          image: ghcr.io/example/mcp-ticketing:1.2.0
          ports:
            - containerPort: 8080
          env:
            - name: OIDC_ISSUER
              value: https://auth.example.com/
            - name: OIDC_JWKS_URL
              value: https://auth.example.com/.well-known/jwks.json
            - name: REDIS_URL
              valueFrom: { secretKeyRef: { name: mcp-redis, key: url } }
            - name: OTEL_EXPORTER_OTLP_ENDPOINT
              value: http://otel-collector.observability:4318
          readinessProbe:
            httpGet: { path: /ready, port: 8080 }
            periodSeconds: 5
          livenessProbe:
            httpGet: { path: /live, port: 8080 }
            periodSeconds: 15
          resources:
            requests: { cpu: 100m, memory: 128Mi }
            limits: { memory: 256Mi }
---
apiVersion: v1
kind: Service
metadata:
  name: mcp-ticketing
  namespace: ai-platform
spec:
  selector: { app: mcp-ticketing }
  ports:
    - port: 80
      targetPort: 8080
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: mcp-ticketing
  namespace: ai-platform
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt
spec:
  tls:
    - hosts: [mcp-ticketing.ai.example.com]
      secretName: mcp-ticketing-tls
  rules:
    - host: mcp-ticketing.ai.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: mcp-ticketing
                port: { number: 80 }

What To Watch In Production

  • p95 tool call latency. Your backend is probably the bottleneck, not the MCP layer.
  • Tool call error rate per tool. A spike in one tool usually means a new model is calling it wrong.
  • Rate limit rejections per client. Indicates either abuse or an agent in a retry loop.
  • Auth failures. Expired tokens, missing scopes — tells you clients need guidance.
  • Schema validation errors. LLMs sometimes invent arguments. Track the frequency.

Common Mistakes

  • Exposing too much. Each tool multiplies the attack surface. Start with read-only.
  • No authorization inside handlers. Transport auth is not enough — check scope per action.
  • Returning giant blobs. Models have finite context. Paginate, summarize, filter.
  • Forgetting idempotency. Write tools may be called twice on retry. Use idempotency keys.
  • No audit log. Every write action should be logged with principal, arguments, and outcome.

Next Steps

MCP is where tool integration with LLMs is going, and investing a week to expose your internal APIs through a production-grade MCP server pays back many times over when agents — yours and your customers' — can safely use them. Start with a single read-only tool, get auth and observability right, and expand from there. If you want help designing MCP servers for your platform or auditing an existing one, get in touch.

filed under
mcpmodel-context-protocolaiinfrastructure
work with us

Want our team to help with your infrastructure?

talk to an engineerFree 30-min discovery callBook
close