The 7-layer security pipeline
Network isolation keeps threats out. The security pipeline handles threats that come through the front door — authenticated users who paste sensitive data, attempt prompt injection, or trigger content policy violations.
Every request passes through seven layers before the AI model sees it:
Layer 1: Rate limiting
Configurable per-user rate limits prevent abuse and contain the blast radius if an account is compromised. A stolen session can't be used to exfiltrate data at scale.
Layer 2: Input DLP (Data Loss Prevention)
Before any prompt reaches the model, automated pattern matching scans for sensitive data — credit card numbers, national ID formats, classified document markers. Matches are blocked with a clear error message to the user, and the event is logged.
Layer 3: Prompt injection detection
A detection engine with over a dozen regex-based patterns identifies common prompt injection attempts — "ignore previous instructions," role-switching attacks, delimiter manipulation. Flagged requests are blocked and logged for security review.
Layer 4: AI Policy Guard
This layer enforces organizational policies on what the AI can and cannot discuss. Requests that pass DLP and injection checks are evaluated against a configurable policy set before reaching the model. Web search requests (via Tavily) pass through this gate too — the model can't fetch external content without policy approval.
Layer 5: Output DLP
The model's response gets the same DLP treatment as the input. If the model generates output containing sensitive patterns (which can happen with RAG over internal documents), the response is blocked before it reaches the user.
Layer 6: CSP headers
Content Security Policy headers prevent XSS attacks and restrict which external resources the application can load. This is defense at the browser level — even if an attacker injected malicious content into a response, the browser won't execute it.
Layer 7: CSRF protection
Cross-Site Request Forgery tokens on every state-changing request prevent attackers from tricking authenticated users into making unintended API calls.
The pipeline is ordered intentionally. Cheap checks (rate limiting, pattern matching) run first. Expensive checks (AI Policy Guard) run last. A brute-force attack is stopped at layer 1, never reaching the layers that cost compute time.
Identity, access, and why we store zero credentials
Authentication uses Microsoft Entra ID with MSAL PKCE flow — the same SSO the client's employees use for every other internal system. No new passwords, no separate identity store.
Inside the application, Entra ID app roles control who can access what. Administrators see usage analytics. Regular users see the chat. The roles map to the client's existing organizational structure.
For service-to-service authentication, every Azure resource uses Managed Identities. The application talks to Cosmos DB, AI Search, and AI Foundry using identity-based access. No connection strings with passwords. No API keys in config files. Azure Key Vault holds the few third-party secrets (like the Tavily API key), and even Key Vault is accessed through a Managed Identity.
The result: there are zero stored credentials anywhere in the codebase or deployment configuration. If someone clones the repository, they get nothing usable.
The audit trail (and what we deliberately don't log)
Every security-relevant event — authentication, authorization decisions, rate limit hits, DLP blocks, policy guard triggers — is logged as a structured event to Application Insights. Security teams can query these events, build alerts, and investigate incidents.
What we don't log: prompt text. This is a deliberate privacy-by-design decision. Conversation content stays in Cosmos DB, where it's protected by the same network isolation and access controls as everything else. Diagnostic logs capture the metadata (who, when, what type of action, what security layer fired) without capturing the content. A security analyst can see that User X triggered a DLP block at 14:32 without seeing what they typed.
Making it useful: RAG over corporate documents
Security without utility is just an expensive firewall. The system is actually useful because of RAG — retrieval-augmented generation over the client's internal document corpus.
Azure AI Search with semantic ranking indexes the client's documents. When a user asks a question, the system runs a concurrent pipeline: semantic search over the document index, history compaction for conversation context, and model inference. The results converge into a response grounded in the organization's actual data.
The fail-open strategy is deliberate: if the search index is temporarily unavailable, the model falls back to its base knowledge rather than returning an error. The user gets a slightly less specific answer instead of a broken experience. The degradation is logged so operations can respond, but the user never hits a wall.
Web search through Tavily adds real-time information when the model determines it needs current data. Every web search request passes through the AI Policy Guard gate first. The model can't search for anything the organization hasn't approved as a searchable topic.
What makes this different from "just using Azure OpenAI"
This is the question we get most often. Azure OpenAI Service already offers private endpoints and content filtering. Why build all of this on top?
Here's the comparison:
| Capability | Public ChatGPT / Wrapper Apps | "Just" Azure OpenAI Service | This Architecture |
|---|
| Data transit | Public internet | Private Endpoint available (optional) | Private Endpoints enforced, public access disabled |
| Network isolation | None | VNet integration possible | Full VNet with 4 dedicated subnets |
| Input DLP | None | Basic content filters | Custom DLP with org-specific patterns |
| Prompt injection defense | None | None (manual implementation needed) | Automated detection engine, 14+ patterns |
| Output DLP | None | Content filters only | Full output scanning with org-specific rules |
| Identity | Email/password or API key | Entra ID possible | Entra ID with MSAL PKCE, app roles, Managed Identities |
| Credential storage | API keys in env vars | Connection strings possible | Zero stored credentials (Managed Identity + Key Vault) |
| Audit trail | Usage logs only | Azure Monitor logs | Structured security events with privacy-by-design |
| Web search | Uncontrolled | Not included | Policy-gated through AI Policy Guard |
| Infrastructure | Manual / ClickOps | Partial Terraform possible | 100% Terraform, fully reproducible |
The gap between "using Azure OpenAI" and "building a secure AI system on Azure" is the difference between buying a lock and building a security system. The lock is a component. The system is the architecture around it.
Deployment and reproducibility
The entire infrastructure deploys from a single `terraform apply`. Every resource — VNet, subnets, Private DNS zones, Private Endpoints, App Service, Cosmos DB, AI Search, Key Vault, Application Gateway, WAF policies — is defined in code and version-controlled.
CI/CD runs through GitHub Actions with OIDC federation. No long-lived credentials in the pipeline. The deployment identity authenticates to Azure using federated tokens that expire automatically.
Why this matters: if the client needs to rebuild the entire environment in a different Azure region (disaster recovery, compliance, or geographic expansion), it's a parameter change and a Terraform run. The architecture is documented in code, not in someone's memory or a wiki page that hasn't been updated since the initial deployment.
Frequently Asked Questions
How long does an architecture like this take to build?
Our engagement ran about 10 weeks from kickoff to production, with a team of two senior engineers. Most of the calendar time went into security review cycles with the client's team, not writing code. The Terraform infrastructure takes about a week to build. The application and security pipeline take 4-6 weeks. Testing, hardening, and documentation fill the remainder.
Does this work with AWS or GCP instead of Azure?
The architectural patterns — VNet isolation, private endpoints, managed identities, layered security — exist on all three clouds. The specific implementation would change (AWS PrivateLink instead of Azure Private Endpoints, IAM roles instead of Managed Identities, Bedrock instead of AI Foundry), but the security model translates directly. We chose Azure here because the client's identity and compliance infrastructure was already Microsoft-based.
What compliance frameworks does this architecture support?
The combination of network isolation, audit logging, DLP, access controls, and infrastructure-as-code provides the technical controls needed for SOC 2, ISO 27001, and defense-sector security requirements. The privacy-by-design approach (no prompt logging in diagnostics) also supports GDPR and CCPA compliance. The specific certifications depend on organizational policies around the technology, not just the technology itself.
Can we use open-source models instead of GPT-4o?
Yes. The architecture doesn't depend on OpenAI's models. Azure AI Foundry supports multiple model families, and the security pipeline (DLP, prompt injection detection, output scanning) works regardless of which model processes the request. We've tested with both GPT-4o and smaller models for different use cases within the same deployment.
What does this cost compared to a ChatGPT Enterprise subscription?
ChatGPT Enterprise runs about $60/user/month. This architecture costs more upfront (infrastructure, engineering, deployment) but gives you complete control over data, security policies, and the AI's behavior. For organizations handling classified or regulated data, the comparison isn't really cost — it's whether you can use AI at all without this level of control. Most organizations in defense, healthcare, and financial services can't. If you're evaluating secure AI deployment for your organization, we'd welcome the conversation.