The Hidden Cost of Generative AI Features on Your Website

Business

The bill is only the start

When businesses scope an AI feature, the conversation usually ends at the API cost. A few cents per thousand tokens, a rough forecast of monthly usage, and the number on the proposal looks reasonable. Two quarters later the actual cost is two or three times that, the site is slower, and someone is asking why the support team has not shrunk.

The API bill is real, but it is the smallest line item in the true cost of running generative AI in production. Here is the rest of it, and how to scope an AI feature so the ROI still stacks up when the quieter costs show up.

Latency hits Core Web Vitals and SEO

A language model call is slow by web standards. Even with streaming, a Claude or GPT response takes 1-5 seconds for first token and 3-15 seconds to complete for useful answers. Put that on the critical path of a page render and your Largest Contentful Paint and Interaction to Next Paint numbers collapse.

Google's Core Web Vitals are a ranking factor. A page that used to render in 1.2 seconds and now renders in 4.5 because a summary is generated server-side on first load will lose organic traffic. We have seen Aussie clients shed 10-30 per cent of organic visits from an AI feature added without thinking about the render path.

The fix is not hard but it has to be designed in. AI output goes on a secondary fetch or a user-triggered action, never blocking initial render. Generated content that is relatively static (product summaries, recommendation blocks) gets pre-computed and cached rather than generated live.

Hallucination is a liability, not a bug

Every generative model fabricates occasionally. In low-stakes contexts this is a nuisance. On a business website it can be a legal or financial exposure.

A chatbot that confidently quotes a wrong price, recommends the wrong product for a customer's stated needs, or invents a feature you do not actually offer has created a commitment or a misrepresentation. Misleading conduct sits under the Australian Consumer Law (ACL) and the penalties are not theoretical. In regulated industries (health, finance, legal) the exposure is much worse.

Liability protection costs money. It means retrieval that actually grounds the output, guardrails that catch off-policy responses, disclaimers in the right places, logging so you can reconstruct what was said, and human review on high-stakes flows. Budget for it at build time, not after an incident.

Rate limits and abuse

Every hosted AI API has rate limits and every public endpoint attracts abuse. A naive implementation that exposes an AI endpoint from your site's frontend is a target. Within days of launch you can expect automated traffic trying to use your API key for unrelated generation, prompt injection to extract your system prompt, or just enough volume to get you throttled out of your own service.

Mitigations are table stakes but they cost something. Server-side API key handling, per-user rate limiting, bot detection (Cloudflare Turnstile or similar), a prompt classifier to reject obviously off-topic queries, and an API spend cap at the vendor level. Without these, your first real abuse event can be a four- or five-figure bill.

Prompt injection is the new XSS

Prompt injection is a class of attack where a user crafts input that manipulates the AI into ignoring its instructions, leaking system prompts, producing offensive content, or calling tools in ways you did not intend. For any assistant that calls functions (booking, quoting, sending emails), injection can have real-world consequences.

There is no perfect defence in 2026. The current state of the art is layered mitigation. Input filtering, least-privilege tool design, output validation, and hard boundaries on what the AI is allowed to do without human confirmation. Treating prompt injection seriously adds 15-25 per cent to the build effort. Skipping it is negligent once the feature interacts with real systems.

Ongoing evaluation is real engineering work

An AI feature is never done. Vendors upgrade models silently, your content changes, user queries evolve, and the responses drift with all of it. If you are not measuring, you are guessing.

A proper evaluation harness is not a spreadsheet. It is a versioned bank of test queries with expected behaviours, an automated runner that scores responses against a rubric (often using another LLM as judge), a dashboard that tracks drift over time, and a weekly manual sample review for the things automated evals miss.

Plan to invest the equivalent of 10-20 per cent of the initial build cost per year on evaluation and tuning. Less and you are flying blind.

Model deprecation is a scheduled outage

The lifespan of a specific model version in 2026 is 12-24 months. Anthropic, OpenAI and Google have all retired model versions that production systems depended on, with 6-12 months of notice that many teams still failed to act on.

When a model is deprecated, your prompts need re-tuning for the replacement. Behaviour that was tested and stable on the old model is not necessarily stable on the new one. Plan for a model migration every 12-18 months. Budget time for re-evaluation, prompt adjustment, and cost recalibration, because pricing and tokenisation tend to shift too.

Abstraction helps. Building through a vendor-neutral layer or keeping model choice behind a config flag makes migrations less painful, though never free.

How to scope an AI feature so ROI actually tracks

A handful of practices we treat as non-negotiable on any AI build.

Write down the metric before the code. What is this feature meant to change? Deflection rate, conversion lift, time to resolve, enquiries per deal. If nobody can name the metric, do not build the feature.

Estimate the full cost. Not just API spend. Add build cost, evaluation harness, ongoing maintenance (2-4 hours per month minimum), a model migration provision, and the opportunity cost of the latency or SEO impact. Compare to the metric uplift. If the numbers do not clear 2-3x in year one, reconsider.

Ship behind a flag. A/B test the AI feature against the current experience. On every AI project where we have done this, something measurable was worse than expected in at least one segment. Better to find out with 10 per cent of traffic than 100.

Set a hard kill switch. When costs spike, quality drops or a model provider has an outage, you need to disable the feature in seconds without a deploy. Feature flags, not code changes.

Review quarterly, not never. Put a calendar entry for a 30-minute review. Costs, metrics, incidents, pending model deprecations. Most AI feature failures come from inattention, not bad design.

The honest summary

Generative AI can deliver real value on a business website. It can also quietly become a line item nobody can justify, tied to a feature nobody wants to switch off because of sunk cost. The difference is in the scoping, not the technology.

If you are weighing up an AI feature and want a clear-eyed view of the total cost and the genuine upside, we are happy to work through the numbers with you before anyone writes a line of code.

The Hidden Cost of Generative AI Features on Your Website

The bill is only the start

Latency hits Core Web Vitals and SEO

Hallucination is a liability, not a bug

Rate limits and abuse

Prompt injection is the new XSS

Ongoing evaluation is real engineering work

Model deprecation is a scheduled outage

How to scope an AI feature so ROI actually tracks

The honest summary

Related articles

How Long Does It Really Take to Build a Business Website in 2026?

Core Web Vitals for Small Business Owners: What Google Actually Measures

Local SEO for Australian Service Businesses: Beyond 'Plumber Near Me'

Business services we offer