Rent the Intelligence, Keep the Data

Development | Denis Susac

Rent the Intelligence, Keep the Data

Sunday, Jun 14, 2026 • 13 min read

The 2026 World Cup turned every sports platform into a firehose of natural-language questions. Here's how we built a grounded, multilingual sports assistant on Dokko — and why the smartest architectural decision was deciding what NOT to build.

The 2026 World Cup kicked off a few days ago, and if you run a sports platform you already know what that does to your traffic. It’s not just more of it. It’s a different shape of it. People who would normally tap through a fixtures list are suddenly typing full sentences: “who’s missing for the match this weekend?”, “how have these two done against each other lately?”, “what’s on tonight in my time zone?”. A tournament compresses a season’s worth of curiosity into a month, and most platforms answer it with the same search box and filter dropdowns they had in May.

That gap — between how people now ask and how products still answer — is the thing conversational AI is actually good at closing. So when a sports information platform came to us wanting a player-facing assistant that could field those questions in natural language, in four languages, streamed back in real time, the interesting work wasn’t “add an LLM.” It was deciding who owns which half of the problem.

This post is about that decision, because I think it’s the one most teams get wrong. The assistant we built runs on Dokko, our document-and-data AI platform, and the cleanest way to describe the architecture is a division of labour: Dokko owns the intelligence; the platform keeps its data and its rules. Neither side reaches into the other’s territory. That boundary is the whole design, and once it’s drawn correctly, almost everything else falls out of it for free.

The Build-or-Borrow Trap

Here’s the default way this project goes wrong. A team decides they want an AI assistant, they sign up for a model provider’s API, and they start building. Within a week they discover that “call the LLM” was the easy 10%. The other 90% is everything around it: session management so a conversation has memory, message history so a reload doesn’t wipe the chat, streaming infrastructure so answers type out instead of arriving in a thirty-second lump, multi-language handling, token accounting so you can meter usage, retry and reconnect logic, and a place to actually configure the model’s behaviour that isn’t a string literal in your backend.

None of that is your product. If you run a sports platform, your product is the sports data and the trust users place in it. Every week you spend building streaming plumbing is a week you didn’t spend on the thing only you can build.

The opposite failure is just as common: hand the whole problem to a third party, push your data into their environment, and accept that your domain rules now live in someone else’s config screen and your users' questions now travel through someone else’s logging. In a regulated vertical that’s a non-starter, and honestly it should make anyone uncomfortable.

The division-of-labour answer rejects both. Ask, for each capability, is this something only we can do? The conversational stack — sessions, streaming, history, language handling, token metering — is not. It’s undifferentiated heavy lifting that looks identical whether you’re a sports platform or a healthcare portal. That belongs to Dokko. The live sports data, the catalogue of what’s actually on offer, and the rules about what the assistant may and may not say — that’s the part that is the platform, and it never leaves the platform’s perimeter.

So the platform team built exactly two things: a thin integration proxy in front of Dokko, and an MCP server that exposes their live sports data as tools. They built no prompt orchestration, no session store, no streaming server, no language router. They borrowed the intelligence and kept the data.

The Architecture at a Glance

Four components participate in every conversation, and the arrows between them are the contract.

Architecture of the sports assistant — a player talks to the platform's integration proxy over REST and SignalR; the proxy talks to Dokko, which calls the platform's own MCP server for live sports data. The data never leaves the operator's perimeter.

The integration proxy is the platform’s backend service. It authenticates the user, applies the operator’s own input checks and usage quotas, forwards approved prompts to Dokko, subscribes to the answer stream, and relays it to the client over SignalR. It is deliberately thin — it orchestrates, it doesn’t think.
Dokko receives the prompt, runs it against a configured agent and its knowledge bases, calls MCP tools when it needs live facts, and streams the answer back token by token.
The MCP sports-data server is a standalone .NET service that exposes the platform’s offer and statistics as a small set of read-only tools. Dokko calls it directly during generation; the proxy never touches it.
The platform’s existing data APIs sit behind the MCP server, which is a thin adapter over them.

The thing I want to draw attention to is what doesn’t cross a boundary. The platform’s front-end never calls a model provider. Dokko never sees the platform’s internal tenant identifiers. The raw sports data is never exported, fine-tuned into a model, or copied into the AI vendor’s storage — it’s fetched read-only, at question time, from behind the operator’s own APIs. Each component knows exactly as much as it needs and not one field more.

What Dokko Gave Us for Free

Because the conversational stack belongs to Dokko, the proxy’s job shrinks to plumbing. Worth walking through what that plumbing actually consumes, because each item is a thing the platform didn’t have to build.

Sessions are Dokko’s. The first prompt of a conversation is sent with a null session ID; Dokko creates the session and returns its ID, and the proxy persists that ID so every follow-up continues the same conversation with full context. Message history is retrievable from Dokko directly, which is how the chat window repopulates after a page reload. The platform stores a session ID and nothing about the conversation’s contents.

Tenancy is a header. Every call to Dokko carries Authorization: Bearer <token> and X-Tenant: <tenant id>. The tenant header is how Dokko isolates this deployment — sessions created under one tenant can’t be read under another. The platform’s own multi-tenant hierarchy stays entirely on the platform side; Dokko only ever sees one tenant per environment.

Context rides along in an opaque token. This is the most elegant part of the integration, and it’s worth slowing down for. Every prompt includes a params object, and inside it the proxy packs a base64-encoded token carrying the user’s organisational scope and time zone:

{
  "chat_widget_configuration_id": "<guid>",
  "message": "what's on tonight?",
  "session_id": "<guid or null on first message>",
  "language": "en",
  "params": {
    "token": "<base64: org scope + timezone>",
    "date": "<current UTC, rounded to the hour>"
  }
}

Dokko doesn’t unpack that token. It forwards it, untouched, to the MCP server with every tool call. So when the assistant looks up “what’s on tonight,” the data lookup is automatically scoped to the right catalogue segment and resolved in the user’s local time — and the model never sees, handles, or could leak a raw tenant identifier. The platform threaded per-user context all the way through to its own data layer without ever exposing internal IDs to the AI. That’s the division of labour expressed in a single field.

Streaming is Dokko’s too. Answers arrive over Dokko’s real-time WebSocket endpoint as a typed event stream: a start event, repeated delta chunks of text, and an end event that carries token-usage figures. The proxy holds one socket per active user, subscribes to the channel named after the session ID, and relays approved chunks to the client. Reconnects use exponential backoff and re-subscribe to live channels; sessions orphaned by a disconnect are torn down after a grace period. This is the piece that makes the product feel modern — the answer types out in real time — and the platform wrote none of it. It consumes a stream.

Usage metering is raw data from Dokko, policy on the platform. The end event reports token consumption. The proxy records it per user and enforces the platform’s own tiers and quotas on top — deciding, before each prompt, whether a user may still talk to the assistant. Dokko provides the meter reading; the business rule stays with the operator. That split recurs everywhere in this design: Dokko supplies a primitive, the platform supplies the judgement.

Grounding: The Assistant Answers From Live Data, Not Memory

A language model asked “who’s missing for the match this weekend?” will happily produce a confident, fluent, completely fabricated answer. For a sports assistant that is worse than useless — the one thing users can trivially check is exactly the thing the model is worst at. So the defining design rule is that the assistant answers from live platform data, never from model memory.

This is where MCP earns its place. The platform’s MCP server exposes its data as a small set of tools that Dokko calls during generation, and the agent is instructed never to answer an offer or statistics question from memory — it must call a tool. The tools fall into two groups.

Offer-discovery tools answer “what’s being played / what’s on” against the live catalogue. The workhorse finds matches filterable by team, opponent, sport, category (country or circuit), tournament, and a time window resolved in the user’s zone — returning full identification for each event, start time, live status, and current score for in-play matches. Two helper tools resolve free-text names to canonical IDs: one for teams, one for tournaments. A miss returns an explicit { "not_found": true } so the model knows to say “I couldn’t find that” rather than invent something.

Statistics tools answer “how are they doing / who’s missing / what happened last time”: recent form (last matches per team), head-to-head history between two sides, current streaks, and unavailable players split by side. All of them are read-only, validate their inputs, cap result sizes, time out against the upstream APIs, and return structured errors instead of failing silently — properties that keep tool calls predictable for the model and stop a slow upstream from stalling a conversation.

The power isn’t in any single tool; it’s in the chaining. Because the agent has both groups available, one ordinary-sounding question becomes a small pipeline of grounded calls:

How a single question becomes a chain of tool calls — the agent resolves the tournament, searches the live offer within a time window, fetches form and missing players for the found event, then composes a streamed answer with every claim grounded in a tool result.

A question like “who’s in better form ahead of the big match this weekend?" typically resolves the tournament first, feeds that ID into an offer search constrained to the coming days, pulls streaks and missing players for the event it finds, and only then composes an answer — streamed back token by token, with every factual claim traceable to a tool result. The platform supplies the facts; Dokko supplies the reasoning and the language. Hallucination has nowhere to enter, because the model is never the source of truth — it’s the narrator.

There’s a layer of small, unglamorous engineering that makes this robust against how people actually type. All input is normalised to English before any tool call, so the same logic serves every market. A shared alias map — roughly 120 colloquial names over four dozen canonical sports (“football” → Soccer, “F1” → Formula 1, “UFC” → Martial Arts) — makes the name-based parameters forgiving. And the tournament-first calling discipline is encoded right into the MCP tool descriptions, so the choreography travels with the tools rather than living in fragile prompt text. That last detail matters more than it looks: the instruction to “resolve the competition, then search” is metadata on the tool, which means it’s versioned and discoverable, not buried in a system prompt.

Behaviour Is Configuration, Not Code

Here’s a question worth sitting with: where does the assistant’s personality live? Its scope, its refusals, its tone, the discipline that makes it call a tool instead of guessing? In a lot of systems the answer is “scattered across the backend in prompt fragments and if-statements.” In this one, it lives in exactly one place — the agent instructions maintained inside Dokko — and the platform code doesn’t know it exists.

That instruction set carries a surprising amount of the system’s design, written in plain language:

A hard scope lock. The agent is defined as the platform’s official assistant and nothing else. It answers about the current offer, statistics, sports, and markets — and politely declines everything else, redirecting back to what it can actually help with. Attempts to override its instructions or change its role get a standard refusal and no further engagement.
Template variables. Dokko substitutes the current date into the instructions at runtime via a placeholder, giving the model a reliable anchor for “this weekend” or “tomorrow” without any prompt assembly on the platform side.
Tool choreography in prose. The grounding discipline — never answer from memory, resolve tournaments before searching, combine form and head-to-head when both apply, relay a not-found honestly in the user’s language — is spelled out as behavioural rules, reinforcing the metadata on the tools themselves.
Language mirroring. The agent always replies in the language the user wrote in. This is the single rule that lets one instruction set serve a four-market deployment (English, German, Swedish, Danish) with no language-specific code in the proxy.

Change any of this — tighten the scope, adjust the tone, add a market — and it’s a single edit inside Dokko, versioned, with no deployment on the platform side. The operator’s domain policy is written once, in language a domain expert can read and review, and the engineering team never has to translate it into code. For me this is the quiet headline of the whole project: the most important “code” in the system isn’t code at all.

The Hard Part Is Knowing It’s a Competitor

One line in the scope lock looks trivial and isn’t: don’t engage with questions about competitors. Stating the rule is easy. The hard part is recognising when a question is one — and that turned out to be one of the more interesting problems we worked through while building this.

The naive instinct is a blocklist: collect the rival operators' names and reject any prompt that contains one. It falls apart almost immediately. There are dozens of competitors, each with brand variants, abbreviations, and the way people actually type them at 11pm — misspelled, lowercased, half-remembered. The question rarely names anyone outright: “is their app better for tennis?”, “where can I get better odds on this?”, “what does the other site have for this match?”. It arrives in four languages. And the moment a blocklist exists, it’s trivially evaded by a typo. A list of strings is simultaneously too narrow — it misses every variant you didn’t anticipate — and too broad, since a legitimate question can mention a brand for entirely innocent reasons. You end up maintaining a growing, multilingual dictionary that is wrong in both directions at once.

The thing a blocklist can’t see is intent. “Is their app better for tennis?” contains no competitor name at all, yet it’s unmistakably a comparison question. So instead of matching strings, the system classifies meaning. This is exactly the kind of job Dokko Skills are built for — small, per-question behaviours activated by a classifier that runs alongside retrieval and scores what the user is trying to do, not which words they used. A question that reads as “compare us to another operator” trips the skill regardless of whether it names a brand, misspells one, or names none; one that simply mentions a brand in passing doesn’t. Because every prompt is normalised to English before this stage, a single classifier covers all four markets instead of four hand-tuned pattern files.

It’s also another instance of the through-line in this whole build: defence in depth, expressed as configuration. The same boundary is drawn in three different places by three different mechanisms — the proxy’s coarse input filter, the agent’s plain-language scope lock, and the semantic skill that catches the cases the other two miss. None of them is the single point of failure, and not one of them is a regular expression trying to enumerate every way a person might mention the competition.

Keeping Final Say

Operating in a regulated space means the operator has to retain editorial control over what the assistant says — that can’t be delegated to any AI vendor, ours included. The division-of-labour architecture makes this straightforward precisely because the integration surface is so clean. Discrete prompt submission on one side and a typed event stream on the other give the platform two natural interception points: it screens inbound prompts before they ever reach Dokko, and it inspects each streamed chunk before relaying it to the user, with the option to redact, hold, or stop delivery.

The point for this discussion isn’t the specific checks — those are the operator’s business and vary by jurisdiction. The point is that Dokko didn’t have to implement any of them. A clean request/stream boundary meant the operator could wrap its own policy layer around the assistant without Dokko knowing or caring what that policy is. The same boundary that let us hand off the conversational stack also let the platform keep the one thing it can’t delegate: the final say.

Why This Is the Pattern, Not Just a Project

Strip away the sports specifics and the shape of this system generalises to almost any serious AI deployment. There is a layer of capability that is genuinely undifferentiated — orchestration, sessions, streaming, multi-language, metering, tool-calling infrastructure — and there is a layer that is irreducibly yours: your data, your domain rules, your perimeter. The teams that succeed are the ones that draw that line deliberately instead of letting it get drawn for them by whichever vendor’s SDK they reached for first.

Dokko is built for exactly that line. Connecting live, tenant-scoped business data is a matter of standing up an MCP server and attaching it to an agent configuration — no fine-tuning, no data export, data stays behind your APIs and is fetched read-only at question time. The opaque-context mechanism threads your scoping through to your own tools without exposing internal identifiers. Tenant isolation keeps deployments separate. And because the whole conversational stack is delivered as a service, the only things you build are the things that are actually yours.

It’s a useful frame even outside a betting platform or a sports feed. If you’re putting an assistant in front of your users this year, the most valuable architectural question isn’t “which model?” It’s “which half of this is mine?” Answer that one well and the rest gets a lot simpler.

The World Cup will be over in a few weeks. The questions won’t stop — there’s always a next tournament, a next season, a next fixture someone wants to ask about in their own words. The platforms that answer those questions well will be the ones that spent their engineering effort on their data and their rules, and let someone else own the plumbing. That’s not a compromise. It’s the design.

If you’re weighing how to add a grounded, multilingual assistant to your own product — and where to draw that line between the intelligence and the data — that’s exactly the conversation we like having. Come find us.