wGrow
menu
Writing B2B SLAs When Core Services Rely on OpenAI Uptime
Compliance 20 May 2026 · 6 min

Writing B2B SLAs When Core Services Rely on OpenAI Uptime

By wGrow Project Team ·

99.9% uptime allows around 43 minutes and 49 seconds of downtime per month, averaged across the calendar year. We build on OpenAI. OpenAI drops. We cannot absorb the SLA penalty for their server issues.

That is the opening conversation with every Singapore SME procurement team before we sign anything. Not a negotiating tactic. Arithmetic.

The Commercial Math Nobody Wants to Run

MONTHLY DOWNTIME IMPACT 99.9% SLA Allowance 43.8 min Major API Outage 90.0 min Illustrative — Comparing the 43.8-minute monthly allowance of a standard 99.9% SLA against a representative 90-minute foundational model API outage.

OpenAI’s status history records a multi-hour API and ChatGPT outage on November 8, 2023 (status.openai.com/history), disrupting production workloads across its customer base — including ours. A separate entry in that same status history, dated June 4, 2024, logs elevated errors and latency across the API (status.openai.com/history). Without an upstream carve-out in your MSA, either type of incident can land on your side of the liability ledger — even when the root cause sits entirely outside your stack.

Take a 4,000/monthcontractillustrative,butintherightballparkwithaclausethatissuesa104,000/month contract — illustrative, but in the right ballpark — with a clause that issues a 10% service credit the moment monthly uptime drops below 99.9%. A single 90-minute outage pushes you roughly 46 minutes past the allowance. On a 4,000 engagement, that credit is $400. You have paid to be the middleman for someone else’s infrastructure failure.

This is the baseline risk of building agentic applications on third-party model endpoints. A software studio that signs a standard SLA for an LLM-powered product without carving out foundational model availability is accepting liability it cannot operationally control.

The Third-Party AI Dependency Clause

Professional woman reviewing a printed legal document at a conference table.

For agentic workflows, we do not sign standard software SLAs. We use a modified Master Services Agreement containing what we call the Third-Party AI Dependency clause.

The clearest test came during a Singapore statutory board project — a public-facing chatbot integrated into their service portal. Legal initially rejected our standard MSA outright. They wanted full vendor accountability: one throat to choke, one SLA number, full stop. Understandable. That is how government procurement is trained to think about software.

Three sessions later, the negotiation had restructured how we define uptime entirely.

The clause separates Application Uptime from Model Uptime. Application Uptime covers our infrastructure: the web layer, the API gateway, the orchestration layer, the database. We own those. We SLA those. 99.5% with standard credits is defensible because we control the stack.

Model Uptime sits in a different box. It covers the availability of the primary LLM endpoint — OpenAI, Anthropic, or whichever foundational model the architecture depends on. Failures here are classified as Third-Party AI Dependency Events, sitting under a modified force majeure definition. Our obligations: detect and log the failure within five minutes; activate fallback routing within ten minutes; notify the client within thirty minutes with a link to the upstream provider’s status page. What we are not obligated to do is issue credits for downtime causally attributable to the upstream endpoint — provided we can demonstrate the failure boundary with telemetry.

Those three sessions worked because procurement had to accept a structural reality: LLM APIs are not hosted databases or CDNs. Those services carry decades of reliability engineering. LLM endpoints do not. The SLA framework has to reflect that gap.

Graceful Degradation Is Not Optional

Fallback Routing
Client UI Router OpenAI API Local 8B Model

A watertight contract does not stop users from getting frustrated. The statutory board’s legal team accepted our clause. That still left an engineering problem: what does the UI do when the OpenAI API drops?

Returning a 500 server error is a failure of architecture, not a failure of the API.

On an SME customer service portal we shipped in Q3 2024, we implemented a three-tier routing strategy. The primary layer calls GPT-4o. If that endpoint returns errors or exceeds a 4-second timeout, the orchestration layer reroutes to a self-hosted Llama 3.1 8B instance running on a Singapore-region VM. Llama 3.1 8B handles classification, FAQ retrieval, and short-form response generation adequately — it cannot match GPT-4o on nuanced reasoning, and running a local instance adds hosting overhead — but it keeps the interface alive.

If the local model also goes down, the system falls back to a cached response layer: pre-generated answers to the 40 highest-frequency queries, served with a degraded-mode banner that reads, “We are currently operating in limited capacity. Responses may be shorter than usual.” Users see something. Ticket volume to human agents spikes modestly. The 500 error page never appears.

The routing logic is a priority queue with health checks polling every 30 seconds. No exotic orchestration framework. The harder decision is which model handles which query class in degraded mode — that is a product call, not an infrastructure one.

Educating Procurement on Where Your Control Ends

Minimalist technical diagram of internal infrastructure and external API boundaries.

App vs. LLM Downtime
App Down
LLM Down
UI Responds?
Data Accessible?
Generative Tasks
offline
degraded
SLA Penalty?

Most Singapore SME founders and IT directors still treat AI as standard SaaS: monthly subscription, defined feature set, uptime guarantee. For agentic applications, that model breaks down. Correcting it is now part of our delivery process.

We run a one-hour onboarding session with every new client’s procurement and IT leads. We walk them through our telemetry dashboard — application response times, model endpoint health checks, the upstream provider status page — and run a simulated incident so they can see exactly which alert fires at which layer. From what I’ve seen, that simulation does more to calibrate expectations than any contract clause ever could.

When an actual incident occurs, we send the client a structured incident report within 48 hours: which layer failed, when, for how long, what the fallback did, and a link to the upstream provider’s post-mortem. Procurement teams can accept AI volatility when you show them the boundary of your control with evidence. What they cannot accept — and should not be asked to — is a vendor who shrugs.

Stop Promising What You Cannot Deliver

Stop promising absolute uptime for agentic workflows. The dependency chain is long, the liability math is punishing, and the tooling is not yet mature enough to absorb it without explicit carve-outs.

The two responsibilities don’t overlap and neither replaces the other: engineers build graceful degradation into the application layer; commercial counsel builds third-party carve-outs into the MSA. Both have to ship before you go to contract.

Building enterprise AI in Singapore today is less about prompting and more about mitigating downstream liability. The contracts and the architecture have to reflect that in equal measure — and the conversation starts, every time, with 43 minutes and 49 seconds.


Editor’s notes (not for publication):

  • $4,000/month figure — flagged as illustrative, but verify it reflects a real contract tier you can defend if challenged.
  • June 4, 2024 GPT-4 degradation — confirm this date against OpenAI’s status history before publishing; the November 2023 date is well-documented but the June 2024 one is less prominent.
  • Llama 3 8B — confirm the model version cited matches what was actually deployed in Q3 2024; Llama 3.1 8B shipped in July 2024, so the version matters for accuracy.
  • Word count: approximately 895 words of body copy.