AIApr 18, 2026·9 min read·Ribbsaeter Systems Engineering

Building Custom MCP Servers for Enterprise AI: A Practical Guide

How Model Context Protocol (MCP) servers expose internal tools and data to LLM agents in production — architecture, authentication, security and the mistakes we've stopped making.

Key takeaways

01MCP is the de-facto standard for letting LLM agents use your tools — Anthropic, OpenAI and Google have all adopted it.
02Treat the MCP server as a security boundary: every tool call is an authenticated, auditable RPC.
03Limit blast radius with per-tool scopes, time-boxed credentials and sandboxed execution.
04Cache aggressively at the tool layer — agents repeat themselves more than humans do.
05Ship observability before you ship features: prompt, tool call, response, latency, cost.

If you have spent the last twelve months wiring custom function-calling glue between your LLM and a half-dozen internal APIs, the Model Context Protocol is the cleanup you have been looking for. MCP is an open, JSON-RPC based standard for letting LLM agents invoke tools and read resources from any system that speaks it. We have shipped MCP servers for a fintech compliance team, a logistics operator and an internal developer-platform — and the patterns are starting to look the same.

What is MCP, in one paragraph

Model Context Protocol is a client–server protocol where the LLM application (Claude Desktop, Cursor, an in-house agent) is the client and the MCP server exposes a typed catalogue of tools, resources and prompts. The client connects, lists what is available, and calls the operations it needs. The protocol itself is small. The interesting work is everything around it: who is allowed to call what, how the call is authorised, what gets logged, and what happens when the agent decides to call the same expensive tool 200 times in a loop.

When you should build a custom MCP server

You have an internal data source (CRM, warehouse, design system, document store) that an off-the-shelf connector does not cover.
You want a single, audited surface that multiple AI tools can consume — Cursor for engineering, Claude Desktop for ops, an in-house chat agent for support.
Your security team has rejected the suggestion of giving an LLM a long-lived production database password.
You want to deprecate a fragile collection of bespoke function-calling integrations.

Reference architecture we use in production

Every enterprise MCP server we ship is a thin Node or Python service in front of three concentric layers: an authentication and authorization layer, a tool registry, and an execution sandbox. The pattern is intentionally boring.

Edge: a load-balanced HTTPS endpoint with mutual TLS or OAuth 2.1 device flow.
Auth: a token-introspection step that returns the calling user, their groups and the scopes they hold.
Tool registry: a typed list of tools annotated with required scopes, rate limits and a cost class.
Execution: each tool runs in a sandboxed worker — for code execution this means an ephemeral container; for data tools it means a query runner with row-level filters bound to the caller's identity.
Observability: every call emits a structured event (caller, tool, arguments hash, latency, outcome, cost) to your log pipeline.

Authentication that survives an audit

We standardise on OAuth 2.1 with the Authorization Code + PKCE flow for human-driven clients (Cursor, Claude Desktop) and on short-lived signed JWTs for agent-to-agent service accounts. Two rules we now treat as non-negotiable: tokens never live longer than fifteen minutes, and every token has explicit scopes that map one-to-one with tool names. A token that can read invoices cannot also send refunds.

Sandboxing tool execution

Even read-only tools should be treated as untrusted. SQL tools run as a least-privilege role with row-level security. File-system tools are confined to a per-session writable mount. Network tools have an egress allow-list. Code-execution tools run in a fresh Firecracker microVM that is destroyed after each call. The cost of provisioning is tiny compared to the cost of explaining to your customer why their data leaked into a model context window.

Cost and latency: caching where it counts

Agents rerun tools far more than humans do. The first deploy of a customer-support MCP server in our portfolio called the same order-lookup endpoint forty-eight times in a single ten-minute conversation. We now cache by argument-hash with TTLs tuned per tool and we expose a cache-bypass flag for cases where freshness matters. Cache hit rates of 60 to 80 percent are realistic for read-heavy workloads.

Observability is the feature

When something goes wrong with an agent — and it will — you want to be able to answer four questions: what prompt produced this call, what arguments did the agent pass, what did the tool return, and how much did it cost. We emit one structured log line per call into a Loki/Grafana stack and replay sessions in a small internal dashboard. This is what turns an MCP server from a black box into something operations will actually trust.

Common mistakes we have stopped making

Letting agents execute SQL strings directly. Always wrap with a parameterised, scope-checked query builder.
Returning raw database rows. Project to a stable JSON contract; the agent should not see your column rename.
Skipping rate limits per tool. One hung agent loop will burn through your quota in minutes.
Trusting the model to redact secrets. Filter at the server, not in the prompt.
Forgetting to version the tool schema. Treat it like a public API — clients break when you break.

How we deploy

Our default stack is TypeScript on Node 20, the official @modelcontextprotocol/sdk, Fastify behind an OAuth 2.1 proxy, observability through OpenTelemetry, and Postgres for the audit log. For Python-heavy data shops we run the official mcp Python SDK on uvicorn with the same auth and observability tier. Both ship as a single container, deploy to Kubernetes or Cloud Run, and expose health and metrics endpoints out of the box.

Where MCP is going

Through 2026 the protocol is converging on a few practical extensions: streaming tool responses, structured progress reporting, capability negotiation for cost-aware agents, and a stable identity propagation story so that downstream systems can attribute actions to the human behind the agent. The teams who get the foundations right today — auth, scopes, sandboxing, observability — get those upgrades almost for free.

Frequently asked questions

Direct answers to questions readers and AI assistants commonly ask about this topic.

What is the Model Context Protocol (MCP)?+

The Model Context Protocol is an open standard, introduced by Anthropic in November 2024, for connecting LLM applications to external tools and data through a typed, JSON-RPC interface. It has since been adopted by OpenAI, Google and the major IDEs.

Do I need to build my own MCP server?+

Only if you have internal tools, data or workflows that no off-the-shelf connector covers, or if your security posture requires you to own the boundary between LLMs and your systems. For public SaaS data, prefer the official connector if one exists.

How is MCP different from OpenAI function calling?+

Function calling is an API feature of a single model provider. MCP is a vendor-neutral protocol: one server, many clients (Claude, ChatGPT, Cursor, custom agents). It also defines resources and prompts, not just tool calls.

Is MCP safe for production?+

MCP is a transport protocol; safety is a property of your implementation. With OAuth 2.1, scoped tokens, sandboxed execution and complete audit logging, MCP servers are running in production at fintechs, healthcare platforms and SaaS companies today.

What language should I write an MCP server in?+

TypeScript and Python both have official SDKs and roughly equal capability. Pick the language closest to the systems you are integrating: TypeScript for web platforms, Python for data and ML pipelines.

Last updated: April 26, 2026 · Written by Ribbsaeter Systems Engineering · AI & Platform Engineering