25 April 2026

Building a grounded AI agent for a consultancy website

A chat agent on a company website has exactly one job that matters: never embarrass the company. Being helpful is the easy part. Any model with a system prompt is helpful. Staying inside the truth while anonymous visitors actively try to bend you — that's the engineering problem.

I built the agent for our consultancy's site (mgcuk.tech) around three decisions worth sharing.

Ground in files, not vibes

The agent knows only what's in a versioned content/knowledge/ directory — markdown files covering services, process, case studies, FAQ and contact. The same files a human could read. No fine-tuning, no vector database for a corpus this size: the whole knowledge base is compiled into the system prompt at build time.

The interesting part is the failure mode. The knowledge loader enforces a 12,000-token budget, and when the corpus outgrows it, the build fails. Loudly, in CI, before anything ships. The lazy alternative — silently truncating the knowledge — would produce an agent that confidently doesn't know half the company, and you'd find out from a confused client. Fail-closed beats degrade-silently every time someone's reputation is attached.

Rate limiting that fails stricter, not looser

Public endpoint, anonymous users, per-token pricing — you do the math. Upstash Redis enforces 20 requests per 10 minutes per visitor. The detail I care about: when identification fails, the limiter doesn't shrug and allow — it drops the request into a single shared "unknown" bucket that is stricter than the normal path. The system's failure posture is "less access", never "more". The same philosophy caps output at 600 tokens and conversation history at 12 messages — bounded cost, bounded attack surface.

Test the refusals, not just the answers

The agent's test suite covers four surfaces: knowledge loading (including the token-budget failure), the streaming route, rate limiting, and lead capture. The tests I find most valuable assert what the agent won't do — answer off-topic questions, exceed its budget, keep talking past the limits. Anyone can demo the happy path; the refusal paths are where a public agent lives or dies.

What I'd tell anyone shipping one of these

Treat the agent as a product surface with an SLA, not a demo. Ground it in files you version-control. Make every limit fail closed. Write tests for the conversations you hope never happen. None of this needs exotic infrastructure; ours is a Next.js route, a Redis counter and a folder of markdown. The discipline is the architecture.

My own site's agent — the one that will glorify me unreasonably while staying factually scrupulous — is being built on exactly this foundation. The personality is a costume; the grounding underneath is non-negotiable.