Building a Custom GPT or Claude Assistant for Your Business Website
TechnologyWhat a custom assistant actually is
The phrase "custom GPT" gets thrown around loosely. In practice, a useful custom assistant on a business website is three things stitched together: a carefully written system prompt that defines tone and boundaries, a grounding layer that feeds in your real business data, and a set of tools the model can call to take action on your behalf.
Drop the grounding and you have a generic chatbot with your logo. Drop the tools and you have a talking FAQ. Drop the prompt and you have ChatGPT. All three together is where the value lives.
When it actually beats a generic chatbot
A decision-tree chatbot (Intercom Fin, Tidio, Drift) can handle scripted conversations well. It breaks down the moment a user asks something slightly off-script. A custom LLM-backed assistant handles the edges: multi-intent questions, context carried across turns, and the kind of messy real-world phrasing people actually use.
In our experience the economics flip in favour of a custom assistant in three situations.
High-volume support. If your team handles hundreds of similar enquiries a week and the answers live in your documentation, a grounded assistant reliably deflects 30-60 per cent of that volume while leaving the hard cases to humans.
Complex product or service discovery. When a prospect needs to be walked through options, prices, and suitability before they convert, an assistant that can reason across your product data outperforms static comparison pages.
Internal tools with real actions. Booking, quoting, account lookups. Anything where the assistant calls into your systems to actually do work is where the ROI is strongest.
For a simple services site with ten pages and a contact form, you probably do not need this. A well-designed FAQ and a quick contact response beats an AI assistant that costs you fifty cents per conversation.
How we build them
Our standard build uses the Anthropic API with Claude as the default model. The same pattern works with OpenAI. The pieces are the same; the vendor is a preference based on reliability, latency, and pricing at the time of build.
The system prompt
Most of the assistant's personality, behaviour, and safety rails live here. We write it explicitly: who the assistant is, what it must never do, how it should handle out-of-scope questions, what tone to use, and critically, what to say when it does not know. A good system prompt runs 500-1500 words and is version-controlled like any other code artefact.
Grounding with retrieval
The assistant does not rely on the model's training data for anything business-specific. Instead, every user question is used to retrieve relevant content from your own sources: product pages, pricing, support articles, policy documents. This is standard RAG and it is what stops the assistant inventing facts about your business.
For small content sets (under a few hundred pages) we often index into Postgres with pgvector. For larger ones or where latency matters, a dedicated vector store like Pinecone or Qdrant.
Tool use and function calling
This is where assistants become genuinely useful rather than just conversational. We define a set of functions the model can call: check availability, calculate a quote, look up an order, create a ticket, book a consultation. The model decides when to call them, collects the required arguments, and responds with the result.
Function calling is also where safety matters most. A tool that writes to production systems needs authorisation, rate limiting, and input validation before it ever touches the assistant.
Guardrails
Three layers we always include. A prompt-level refusal for off-topic or unsafe requests. An input filter that blocks known prompt-injection patterns and excessive length. An output filter that checks the response against a minimal policy (no competitor recommendations, no unauthorised price commitments, no medical or legal advice where relevant). None of these are perfect, but together they catch the majority of real-world misbehaviour.
Cost modelling: what it actually costs to run
Vendor pricing changes regularly, but the shape of the cost is stable. You pay per token for the model, for embeddings, and for the infrastructure that runs around it.
A representative production assistant we built in 2026 for a mid-sized services business averages roughly AUD 0.04-0.08 per completed conversation using Claude Sonnet with retrieval and one or two tool calls. At 3,000 conversations per month that is around AUD 150-250 in model costs, plus the vector database and hosting.
Two traps we watch for. Long conversation histories can quietly multiply token usage because the whole transcript is re-sent on every turn. Caching the system prompt and aggressive trimming of prior turns keeps this under control. And RAG context that is too large is the second offender: retrieving ten 2,000-token chunks per query when three would do is pure waste.
We always put a hard spend cap at the API level and alert on anomalous usage. It stops a bot or a bug turning into a five-figure bill.
What to actually measure
Without measurement, an AI assistant becomes a feature you paid for and cannot defend. We instrument every assistant we ship with at least these:
- Deflection rate. Conversations that resolved without creating a support ticket or form submission. This is the primary ROI metric.
- Escalation quality. When the assistant handed off to a human, did it include useful context, and did the human have to start over?
- Answer quality. A sample of conversations reviewed weekly against a rubric: correct, partially correct, wrong, unsafe. Early on this needs to be manual. Later it can be semi-automated with an LLM-as-judge.
- Cost per resolved conversation. Model spend divided by deflected conversations, tracked against the cost of the support team handling them.
- Unsafe outputs. A count of responses that triggered the output filter or were flagged in review. It should be near zero and trending down.
Where it tends to go wrong
Shipping without evaluation is the single most common failure. The assistant looks great in the demo, quietly degrades under real traffic, and nobody notices for weeks because nobody is reading the logs.
Scope creep comes second. An assistant sold as a sales helper ends up being asked to do HR, legal, and technical support. Each new surface is another prompt injection vector and another evaluation problem. Keep the scope tight and add capabilities deliberately.
If you are thinking about a custom assistant for your business, or you have one in production and it is not performing, we are happy to have a practical conversation about whether it is the right fit and what the build would look like.
