The AI Chat Architecture That Made a CISO Say Yes in One Meeting

TL;DR: A defense-sector CISO reviewed this architecture once and approved it. Single VNet, Private Endpoints everywhere, seven security layers per request, zero stored credentials. Here's exactly how we built it.

The gap between "AI demo" and "CISO-approved"

Every enterprise wants AI. Most security teams can't approve what's on the table. It's not that the technology is immature — GPT-4o is genuinely impressive. The gap is in what sits around the model: the networking, the identity layer, the data boundaries, the audit trail.

A CISO evaluating an AI deployment needs to answer three questions: Where does the data go? Who can access the prompts? What happens when someone pastes an internal document into the chat? Getting those answers right is hard. It requires architectural decisions that most AI implementations haven't been designed to handle — because they were built for speed to demo, not speed to production.

Earlier this year, a defense-sector client asked us to close that gap. The brief: an internal AI chat that security leadership would sign off on before it touched a single user. Not eventually. Not after a hardening sprint. Before launch.

This is what the architecture looks like, and why each decision matters.

The stack: what's inside the perimeter

The system runs on Azure, chosen because the client's identity infrastructure was already on Microsoft Entra ID and they needed data residency guarantees.

The core components:

Next.js 14 (App Router) as the application framework, running server-side with streaming SSE for real-time chat responses
Azure AI Foundry with GPT-4o and text-embedding-3-small, accessed through Azure's enterprise API (not the public OpenAI endpoint)
Azure Cosmos DB (Serverless) for conversation persistence
Azure AI Search with semantic ranking for retrieval-augmented generation over the client's internal documents
Tavily API for real-time web search when the model needs current information, gated by an AI Policy Guard
Microsoft Entra ID with MSAL PKCE for single sign-on against the existing corporate directory
100% Terraform (azurerm ~> 4.0) for every piece of infrastructure, from the VNet to the DNS zones

The key architectural decision: everything runs inside a single Azure VNet (10.0.0.0/16). No exceptions.

Zero-trust networking: why it matters more than the model

If you ask most enterprise AI vendors about security, they'll talk about the model's content filters. That's like asking about a bank's vault and being told about the quality of the locks on the filing cabinets.

Network architecture is where real enterprise security lives. Here's what ours looks like.

The VNet structure

Four subnets, each with a specific purpose:

snet-app — where the Next.js application runs
snet-private-endpoints — dedicated subnet for all Azure Private Endpoints
snet-agw — the Application Gateway WAF v2 subnet
AzureBastionSubnet — for secure administrative access (no SSH from the internet)

Private Endpoints for everything

Every PaaS service — AI Foundry, Cosmos DB, AI Search, Key Vault, Container Registry — is accessed exclusively through Azure Private Endpoints. Public network access is disabled on every service.

What this means in practice: when the application calls GPT-4o, that request travels over Microsoft's backbone network through a private IP address. It never touches the public internet. The same applies to database queries, search requests, and secret retrieval.

A CISO can look at this architecture and draw a single line around it. Everything inside the line is private. The only public-facing component is the Application Gateway with WAF v2 running OWASP 3.2 rule sets. That's the one door into the system, and it inspects every request before it enters.

Why this matters for your organization

Most "enterprise" AI deployments route API calls over the public internet with TLS encryption. That's secure in transit, but it means your prompts — which may contain proprietary data, internal documents, or strategic plans — traverse infrastructure you don't control. Private Endpoints eliminate that exposure entirely.

The 7-layer security pipeline

Network isolation keeps threats out. The security pipeline handles threats that come through the front door — authenticated users who paste sensitive data, attempt prompt injection, or trigger content policy violations.

Every request passes through seven layers before the AI model sees it:

Layer 1: Rate limiting

Configurable per-user rate limits prevent abuse and contain the blast radius if an account is compromised. A stolen session can't be used to exfiltrate data at scale.

Layer 2: Input DLP (Data Loss Prevention)

Before any prompt reaches the model, automated pattern matching scans for sensitive data — credit card numbers, national ID formats, classified document markers. Matches are blocked with a clear error message to the user, and the event is logged.

Layer 3: Prompt injection detection

A detection engine with over a dozen regex-based patterns identifies common prompt injection attempts — "ignore previous instructions," role-switching attacks, delimiter manipulation. Flagged requests are blocked and logged for security review.

Layer 4: AI Policy Guard

This layer enforces organizational policies on what the AI can and cannot discuss. Requests that pass DLP and injection checks are evaluated against a configurable policy set before reaching the model. Web search requests (via Tavily) pass through this gate too — the model can't fetch external content without policy approval.

Layer 5: Output DLP

The model's response gets the same DLP treatment as the input. If the model generates output containing sensitive patterns (which can happen with RAG over internal documents), the response is blocked before it reaches the user.

Layer 6: CSP headers

Content Security Policy headers prevent XSS attacks and restrict which external resources the application can load. This is defense at the browser level — even if an attacker injected malicious content into a response, the browser won't execute it.

Layer 7: CSRF protection

Cross-Site Request Forgery tokens on every state-changing request prevent attackers from tricking authenticated users into making unintended API calls.

The pipeline is ordered intentionally. Cheap checks (rate limiting, pattern matching) run first. Expensive checks (AI Policy Guard) run last. A brute-force attack is stopped at layer 1, never reaching the layers that cost compute time.

Identity, access, and why we store zero credentials

Authentication uses Microsoft Entra ID with MSAL PKCE flow — the same SSO the client's employees use for every other internal system. No new passwords, no separate identity store.

Inside the application, Entra ID app roles control who can access what. Administrators see usage analytics. Regular users see the chat. The roles map to the client's existing organizational structure.

For service-to-service authentication, every Azure resource uses Managed Identities. The application talks to Cosmos DB, AI Search, and AI Foundry using identity-based access. No connection strings with passwords. No API keys in config files. Azure Key Vault holds the few third-party secrets (like the Tavily API key), and even Key Vault is accessed through a Managed Identity.

The result: there are zero stored credentials anywhere in the codebase or deployment configuration. If someone clones the repository, they get nothing usable.

The audit trail (and what we deliberately don't log)

Every security-relevant event — authentication, authorization decisions, rate limit hits, DLP blocks, policy guard triggers — is logged as a structured event to Application Insights. Security teams can query these events, build alerts, and investigate incidents.

What we don't log: prompt text. This is a deliberate privacy-by-design decision. Conversation content stays in Cosmos DB, where it's protected by the same network isolation and access controls as everything else. Diagnostic logs capture the metadata (who, when, what type of action, what security layer fired) without capturing the content. A security analyst can see that User X triggered a DLP block at 14:32 without seeing what they typed.

Making it useful: RAG over corporate documents

Security without utility is just an expensive firewall. The system is actually useful because of RAG — retrieval-augmented generation over the client's internal document corpus.

Azure AI Search with semantic ranking indexes the client's documents. When a user asks a question, the system runs a concurrent pipeline: semantic search over the document index, history compaction for conversation context, and model inference. The results converge into a response grounded in the organization's actual data.

The fail-open strategy is deliberate: if the search index is temporarily unavailable, the model falls back to its base knowledge rather than returning an error. The user gets a slightly less specific answer instead of a broken experience. The degradation is logged so operations can respond, but the user never hits a wall.

Web search through Tavily adds real-time information when the model determines it needs current data. Every web search request passes through the AI Policy Guard gate first. The model can't search for anything the organization hasn't approved as a searchable topic.

What makes this different from "just using Azure OpenAI"

This is the question we get most often. Azure OpenAI Service already offers private endpoints and content filtering. Why build all of this on top?

Here's the comparison:

Capability	Public ChatGPT / Wrapper Apps	"Just" Azure OpenAI Service	This Architecture
Data transit	Public internet	Private Endpoint available (optional)	Private Endpoints enforced, public access disabled
Network isolation	None	VNet integration possible	Full VNet with 4 dedicated subnets
Input DLP	None	Basic content filters	Custom DLP with org-specific patterns
Prompt injection defense	None	None (manual implementation needed)	Automated detection engine, 14+ patterns
Output DLP	None	Content filters only	Full output scanning with org-specific rules
Identity	Email/password or API key	Entra ID possible	Entra ID with MSAL PKCE, app roles, Managed Identities
Credential storage	API keys in env vars	Connection strings possible	Zero stored credentials (Managed Identity + Key Vault)
Audit trail	Usage logs only	Azure Monitor logs	Structured security events with privacy-by-design
Web search	Uncontrolled	Not included	Policy-gated through AI Policy Guard
Infrastructure	Manual / ClickOps	Partial Terraform possible	100% Terraform, fully reproducible

The gap between "using Azure OpenAI" and "building a secure AI system on Azure" is the difference between buying a lock and building a security system. The lock is a component. The system is the architecture around it.

Deployment and reproducibility

The entire infrastructure deploys from a single `terraform apply`. Every resource — VNet, subnets, Private DNS zones, Private Endpoints, App Service, Cosmos DB, AI Search, Key Vault, Application Gateway, WAF policies — is defined in code and version-controlled.

CI/CD runs through GitHub Actions with OIDC federation. No long-lived credentials in the pipeline. The deployment identity authenticates to Azure using federated tokens that expire automatically.

Why this matters: if the client needs to rebuild the entire environment in a different Azure region (disaster recovery, compliance, or geographic expansion), it's a parameter change and a Terraform run. The architecture is documented in code, not in someone's memory or a wiki page that hasn't been updated since the initial deployment.

Frequently Asked Questions

How long does an architecture like this take to build? Our engagement ran about 10 weeks from kickoff to production, with a team of two senior engineers. Most of the calendar time went into security review cycles with the client's team, not writing code. The Terraform infrastructure takes about a week to build. The application and security pipeline take 4-6 weeks. Testing, hardening, and documentation fill the remainder.

Does this work with AWS or GCP instead of Azure? The architectural patterns — VNet isolation, private endpoints, managed identities, layered security — exist on all three clouds. The specific implementation would change (AWS PrivateLink instead of Azure Private Endpoints, IAM roles instead of Managed Identities, Bedrock instead of AI Foundry), but the security model translates directly. We chose Azure here because the client's identity and compliance infrastructure was already Microsoft-based.

What compliance frameworks does this architecture support? The combination of network isolation, audit logging, DLP, access controls, and infrastructure-as-code provides the technical controls needed for SOC 2, ISO 27001, and defense-sector security requirements. The privacy-by-design approach (no prompt logging in diagnostics) also supports GDPR and CCPA compliance. The specific certifications depend on organizational policies around the technology, not just the technology itself.

Can we use open-source models instead of GPT-4o? Yes. The architecture doesn't depend on OpenAI's models. Azure AI Foundry supports multiple model families, and the security pipeline (DLP, prompt injection detection, output scanning) works regardless of which model processes the request. We've tested with both GPT-4o and smaller models for different use cases within the same deployment.

What does this cost compared to a ChatGPT Enterprise subscription? ChatGPT Enterprise runs about $60/user/month. This architecture costs more upfront (infrastructure, engineering, deployment) but gives you complete control over data, security policies, and the AI's behavior. For organizations handling classified or regulated data, the comparison isn't really cost — it's whether you can use AI at all without this level of control. Most organizations in defense, healthcare, and financial services can't. If you're evaluating secure AI deployment for your organization, we'd welcome the conversation.

The AI Chat Architecture That Made a CISO Say Yes in One Meeting

The gap between "AI demo" and "CISO-approved"

The stack: what's inside the perimeter

Zero-trust networking: why it matters more than the model

The VNet structure

Private Endpoints for everything

Why this matters for your organization

Building an AI system that needs to pass security review?

The 7-layer security pipeline

Layer 1: Rate limiting

Layer 2: Input DLP (Data Loss Prevention)

Layer 3: Prompt injection detection

Layer 4: AI Policy Guard

Layer 5: Output DLP

Layer 6: CSP headers

Layer 7: CSRF protection

Identity, access, and why we store zero credentials

The audit trail (and what we deliberately don't log)

Making it useful: RAG over corporate documents

What makes this different from "just using Azure OpenAI"

Deployment and reproducibility

Frequently Asked Questions

Tell us what you’re building.
We’ll come back with a plan and a timeline.

The AI Chat Architecture That Made a CISO Say Yes in One Meeting

The gap between "AI demo" and "CISO-approved"

The stack: what's inside the perimeter

Zero-trust networking: why it matters more than the model

The VNet structure

Private Endpoints for everything

Why this matters for your organization

Building an AI system that needs to pass security review?

The 7-layer security pipeline

Layer 1: Rate limiting

Layer 2: Input DLP (Data Loss Prevention)

Layer 3: Prompt injection detection

Layer 4: AI Policy Guard

Layer 5: Output DLP

Layer 6: CSP headers

Layer 7: CSRF protection

Identity, access, and why we store zero credentials

The audit trail (and what we deliberately don't log)

Making it useful: RAG over corporate documents

What makes this different from "just using Azure OpenAI"

Deployment and reproducibility

Frequently Asked Questions

Tell us what you’re building.We’ll come back with a plan and a timeline.

Tell us what you’re building.
We’ll come back with a plan and a timeline.