From EPAC to a governance framework
We’ve spent three weeks deep in EPAC territory—learning the tool, building pipelines, and mastering advanced patterns. Now it’s time to zoom out.
A governance framework isn’t just policies. It’s the entire system: the hierarchy that organizes your Azure estate, the policies that enforce standards, the monitoring that tracks compliance, and the automation that fixes drift.
This week, we’re building that system with designs you can actually implement. Management group design drives everything else, and the approach is always the same: start with audit, automate remediation, then enforce.
Governance framework architecture
I think about governance in four layers:
┌──────────────────────────────────────────────────────────────────┐│ Layer 4: Visibility ││ Compliance dashboards, reporting, alerting │└──────────────────────────────────────────────────────────────────┘ │┌──────────────────────────────────────────────────────────────────┐│ Layer 3: Enforcement ││ Policy assignments, deny effects, audit effects │└──────────────────────────────────────────────────────────────────┘ │┌──────────────────────────────────────────────────────────────────┐│ Layer 2: Remediation ││ DeployIfNotExists, Modify effects, remediation tasks │└──────────────────────────────────────────────────────────────────┘ │┌──────────────────────────────────────────────────────────────────┐│ Layer 1: Structure ││ Management groups, subscriptions, naming conventions │└──────────────────────────────────────────────────────────────────┘Here’s how to build each one.
Layer 1: Management group structure
Your management group hierarchy is the foundation—it builds directly on the cloud foundation we covered earlier. Get it wrong, and everything above it becomes harder.
Cloud Adoption Framework recommended structure
Tenant Root Group│├── Intermediate Root (your-org-name)│ ││ ├── Platform│ │ ├── Management # Log Analytics, Automation│ │ ├── Connectivity # Hub VNets, Firewalls, DNS│ │ └── Identity # Domain Controllers, ADDS│ ││ ├── Landing Zones│ │ ├── Corp # Internal applications│ │ │ ├── Corp-Prod│ │ │ └── Corp-NonProd│ │ └── Online # Internet-facing applications│ │ ├── Online-Prod│ │ └── Online-NonProd│ ││ ├── Sandbox # Experimentation, no connectivity│ ││ └── Decommissioned # Subscriptions pending deletionWhy this structure works
| Level | Purpose | Policy Focus |
|---|---|---|
| Intermediate Root | Organization-wide standards | Security baselines, allowed regions |
| Platform | Shared services | Stricter controls, limited access |
| Landing Zones | Workload hosting | Workload-specific policies |
| Corp/Online | Connectivity type | Network policies, public exposure |
| Prod/NonProd | Environment | Deployment gates, cost controls |
| Sandbox | Experimentation | Minimal policies, cost limits only |
Note that governance and access control go hand in hand—pair this structure with Privileged Identity Management to enforce just-in-time access at each level. For cost controls at the Sandbox and Prod/NonProd levels, see Azure cost management with budget alerts.
Implementation with Terraform
# Management Group Hierarchyresource "azurerm_management_group" "intermediate_root" { display_name = "Contoso" parent_management_group_id = data.azurerm_management_group.tenant_root.id}
resource "azurerm_management_group" "platform" { display_name = "Platform" parent_management_group_id = azurerm_management_group.intermediate_root.id}
resource "azurerm_management_group" "landing_zones" { display_name = "Landing Zones" parent_management_group_id = azurerm_management_group.intermediate_root.id}
resource "azurerm_management_group" "corp" { display_name = "Corp" parent_management_group_id = azurerm_management_group.landing_zones.id}
resource "azurerm_management_group" "online" { display_name = "Online" parent_management_group_id = azurerm_management_group.landing_zones.id}
resource "azurerm_management_group" "sandbox" { display_name = "Sandbox" parent_management_group_id = azurerm_management_group.intermediate_root.id}
resource "azurerm_management_group" "decommissioned" { display_name = "Decommissioned" parent_management_group_id = azurerm_management_group.intermediate_root.id}Layer 3: Policy assignment strategy
Policies should flow down the hierarchy, with more specific controls at lower levels.
Assignment hierarchy
Intermediate Root (Contoso): # Universal security standards - Allowed Locations (Deny) - Require Resource Tags (Deny) - Audit Unencrypted Storage (Audit) - Deploy Defender for Cloud (DeployIfNotExists)
Platform: # Strict platform controls - Inherit all from parent - Deny Public IP Addresses (Deny) - Require Private Endpoints (Deny) - Deploy Diagnostic Settings (DeployIfNotExists)
Landing Zones: # Workload standards - Inherit all from parent - Audit Resources Without Tags (Audit) - Deploy Activity Log Settings (DeployIfNotExists)
Corp: # Internal workload controls - Inherit all from parent - Deny Internet Outbound without Firewall (Deny)
Online: # Internet-facing controls - Inherit all from parent - Require WAF on Application Gateway (Deny) - Audit Missing DDoS Protection (Audit)
Sandbox: # Minimal controls - Budget Alerts Only (DeployIfNotExists) - Audit Expensive SKUs (Audit) # Explicitly EXCLUDE security deny policiesPolicy effect selection guide
| Scenario | Effect | When to Use |
|---|---|---|
| Block non-compliant deployments | Deny | Well-established standards, critical security |
| Track compliance without blocking | Audit | New policies, understanding current state |
| Auto-configure resources | DeployIfNotExists | Monitoring, logging, security agents |
| Modify existing properties | Modify | Tags, network settings, encryption |
| Block modifications | DenyAction | Prevent deletion of critical resources |
Example: tiered enforcement
Start with the Audit effect in Phase 1 to understand your current state before blocking anything:
{ "assignment": { "name": "require-storage-https", "displayName": "Require HTTPS for Storage Accounts" }, "parameters": { "effect": "Audit" }, "metadata": { "enforcementPhase": "1-audit", "denyDate": "2026-03-01", "notes": "Move to Deny after 60 days of compliance monitoring" }}After compliance reaches your target, move to Deny in Phase 2 to enforce the standard:
{ "parameters": { "effect": "Deny" }, "metadata": { "enforcementPhase": "2-deny", "auditCompletedDate": "2026-02-28", "complianceAtEnforcement": "98.5%" }}Layer 2: Remediation architecture
Policies with DeployIfNotExists and Modify effects can automatically fix non-compliant resources.
DeployIfNotExists pattern
Use for deploying additional resources (e.g., diagnostic settings):
{ "policyRule": { "if": { "field": "type", "equals": "Microsoft.Storage/storageAccounts" }, "then": { "effect": "DeployIfNotExists", "details": { "type": "Microsoft.Insights/diagnosticSettings", "name": "storage-diagnostics", "existenceCondition": { "allOf": [ { "field": "Microsoft.Insights/diagnosticSettings/logs.enabled", "equals": "true" } ] }, "roleDefinitionIds": [ "/providers/Microsoft.Authorization/roleDefinitions/749f88d5-cbae-40b8-bcfc-e573ddc772fa", // Monitoring Contributor "/providers/Microsoft.Authorization/roleDefinitions/92aaf0da-9dab-42b6-94a3-d43ce8d16293" // Log Analytics Contributor ], "deployment": { "properties": { "mode": "incremental", "template": { "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#", "contentVersion": "1.0.0.0", "parameters": { "storageAccountName": { "type": "string" }, "logAnalyticsWorkspaceId": { "type": "string" } }, "resources": [ { "type": "Microsoft.Storage/storageAccounts/providers/diagnosticSettings", "apiVersion": "2021-05-01-preview", "name": "[concat(parameters('storageAccountName'), '/Microsoft.Insights/storage-diagnostics')]", "properties": { "workspaceId": "[parameters('logAnalyticsWorkspaceId')]", "logs": [ { "category": "StorageRead", "enabled": true }, { "category": "StorageWrite", "enabled": true } ], "metrics": [ { "category": "Transaction", "enabled": true } ] } } ] }, "parameters": { "storageAccountName": { "value": "[field('name')]" }, "logAnalyticsWorkspaceId": { "value": "[parameters('logAnalyticsWorkspaceId')]" } } } } } } }}Modify pattern
Use for changing properties on existing resources (e.g., tags):
{ "policyRule": { "if": { "allOf": [ { "field": "type", "equals": "Microsoft.Resources/subscriptions/resourceGroups" }, { "field": "tags['Environment']", "exists": "false" } ] }, "then": { "effect": "Modify", "details": { "roleDefinitionIds": [ "/providers/Microsoft.Authorization/roleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c" ], "operations": [ { "operation": "addOrReplace", "field": "tags['Environment']", "value": "[if(contains(field('name'), 'prod'), 'Production', 'Development')]" } ] } } }}Remediation task automation
Automate remediation with a scheduled pipeline:
# Get all non-compliant policy states for DeployIfNotExists policies$nonCompliantStates = Get-AzPolicyState ` -ManagementGroupName "contoso" ` -Filter "ComplianceState eq 'NonCompliant' and PolicyDefinitionAction eq 'deployifnotexists'" ` -Top 1000
# Group by policy assignment$groupedByAssignment = $nonCompliantStates | Group-Object PolicyAssignmentId
foreach ($group in $groupedByAssignment) { $assignmentId = $group.Name $resourceCount = $group.Count
Write-Host "Creating remediation for $assignmentId ($resourceCount resources)"
# Create remediation task Start-AzPolicyRemediation ` -Name "auto-remediation-$(Get-Date -Format 'yyyyMMdd-HHmmss')" ` -ManagementGroupName "contoso" ` -PolicyAssignmentId $assignmentId ` -ResourceDiscoveryMode ReEvaluateCompliance
# Add delay between remediations to avoid throttling Start-Sleep -Seconds 30}Layer 4: Compliance visibility
Without visibility, you’re just hoping your policies work. This layer is how you actually know.
Azure Policy compliance dashboard
Built-in compliance view shows:
- Overall compliance percentage
- Non-compliant resources by policy
- Trend over time
Access at: Azure Portal → Policy → Compliance
Custom compliance reporting
Export compliance data for custom dashboards:
# Export compliance summary# Get-AzPolicyStateSummary returns aggregate counts, not individual resource states$summary = Get-AzPolicyStateSummary -ManagementGroupName "contoso"
$report = @{ Timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss" NonCompliantResources = $summary.Results.NonCompliantResources NonCompliantPolicies = $summary.Results.NonCompliantPolicies}
# For per-resource compliance details, use Get-AzPolicyState instead$allStates = Get-AzPolicyState -ManagementGroupName "contoso" -Top 5000
$compliant = ($allStates | Where-Object { $_.ComplianceState -eq 'Compliant' }).Count$nonCompliant = ($allStates | Where-Object { $_.ComplianceState -eq 'NonCompliant' }).Count$total = $compliant + $nonCompliant
$report["TotalResources"] = $total$report["CompliantResources"] = $compliant$report["CompliancePercentage"] = if ($total -gt 0) { [math]::Round($compliant / $total * 100, 2)} else { 0 }
$report | ConvertTo-Json | Out-File "compliance-report-$(Get-Date -Format 'yyyyMMdd').json"Log Analytics integration
Stream policy events to Log Analytics:
The standard azurerm_monitor_diagnostic_setting resource does not support subscription-scoped diagnostics. Use azapi_resource instead:
resource "azapi_resource" "subscription_diagnostic_setting" { type = "Microsoft.Insights/diagnosticSettings@2021-05-01-preview" name = "policy-to-log-analytics" parent_id = data.azurerm_subscription.current.id
body = jsonencode({ properties = { workspaceId = azurerm_log_analytics_workspace.governance.id logs = [ { category = "Policy" enabled = true }, { category = "Administrative" enabled = true } ] } })}Query non-compliant resources:
AzureActivity| where CategoryValue == "Policy"| where OperationNameValue == "Microsoft.Authorization/policies/audit/action" or OperationNameValue == "Microsoft.Authorization/policies/deny/action"| extend PolicyProps = parse_json(Properties)| summarize Count = count() by PolicyDefinitionName = tostring(PolicyProps.policies[0].policyDefinitionName), Resource = tostring(PolicyProps.resource)| order by Count descAlerting on compliance changes
resource "azurerm_monitor_scheduled_query_rules_alert_v2" "compliance_drop" { name = "compliance-drop-alert" resource_group_name = azurerm_resource_group.governance.name location = azurerm_resource_group.governance.location
evaluation_frequency = "PT1H" window_duration = "PT1H" scopes = [azurerm_log_analytics_workspace.governance.id]
criteria { query = <<-QUERY AzureActivity | where CategoryValue == "Policy" | where OperationNameValue == "Microsoft.Authorization/policies/audit/action" | summarize NonCompliantCount = count() by bin(TimeGenerated, 1h) | where NonCompliantCount > 100 // Adjust this threshold to your environment baseline QUERY
time_aggregation_method = "Count" threshold = 0 operator = "GreaterThan" }
action { action_groups = [azurerm_monitor_action_group.governance_alerts.id] }}Putting it all together: implementation roadmap
Phase 1: Foundation (Weeks 1-2)
Goals:
- Establish management group hierarchy
- Deploy baseline policies in Audit mode
- Configure compliance monitoring
Actions:
- Create management group structure
- Deploy EPAC with DEV environment only
- Assign core security policies (Audit effect)
- Set up Log Analytics for policy events
Phase 2: Enforcement (Weeks 3-4)
Goals:
- Move validated policies to Deny
- Implement remediation pipelines
- Add deployment gates
Actions:
- Review Phase 1 compliance data
- Update high-compliance policies to Deny
- Deploy remediation automation
- Create approval gates for policy changes
Phase 3: Optimization (Weeks 5-8)
Goals:
- Full environment coverage
- Custom policy development
- Advanced monitoring
Actions:
- Promote to Non-Prod and Production
- Develop organization-specific policies
- Build compliance dashboards
- Implement alerting
Phase 4: Maturity (Ongoing)
Goals:
- Continuous improvement
- Policy lifecycle management
- Compliance reporting
Actions:
- Quarterly policy reviews
- Exemption audits
- New Azure feature policy coverage
- Stakeholder reporting
Common governance patterns
Pattern: tag inheritance
Automatically inherit tags from resource groups:
{ "displayName": "Inherit CostCenter tag from resource group", "policyRule": { "if": { "allOf": [ { "field": "tags['CostCenter']", "exists": "false" }, { "value": "[resourceGroup().tags['CostCenter']]", "notEquals": "" } ] }, "then": { "effect": "Modify", "details": { "roleDefinitionIds": [ "/providers/Microsoft.Authorization/roleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c" ], "operations": [ { "operation": "addOrReplace", "field": "tags['CostCenter']", "value": "[resourceGroup().tags['CostCenter']]" } ] } } }}Pattern: deny with policy exemptions
Use Azure Policy exemptions for controlled exceptions rather than tag-based bypasses. Tag-based approaches are insecure because any user with “Tag Contributor” permissions can add the bypass tag and circumvent the deny policy.
The correct approach is to use EPAC-managed policy exemptions with expiry dates and approval metadata:
{ "exemptions": { "public-ip-legacy-app": { "displayName": "Legacy app public IP - approved exception", "policyAssignmentId": "/providers/Microsoft.Management/managementGroups/contoso/providers/Microsoft.Authorization/policyAssignments/deny-public-ip", "exemptionCategory": "Waiver", "expiresOn": "2026-06-01T00:00:00Z", "metadata": { "approvedBy": "security-review-board", "ticketReference": "SEC-2026-0042", "reviewDate": "2026-05-01" } } }}This approach ensures exceptions are tracked in source control, require RBAC permissions to create, and automatically expire.
Pattern: environment-aware policies
Different enforcement by environment tag:
{ "policyRule": { "if": { "allOf": [ { "field": "tags['Environment']", "equals": "Production" }, { "field": "type", "equals": "Microsoft.Compute/virtualMachines" }, { "field": "Microsoft.Compute/virtualMachines/storageProfile.osDisk.managedDisk.storageAccountType", "notIn": ["Premium_LRS", "Premium_ZRS"] } ] }, "then": { "effect": "deny" } }}Key takeaways
-
A well-designed management group hierarchy makes everything else possible. Get that wrong and you’ll fight the structure forever.
-
Layer your controls: Structure, then Remediation, then Enforcement, then Visibility. Each layer builds on the previous one.
-
Start with Audit, graduate to Deny. Understand your current state before blocking deployments.
-
DeployIfNotExists and Modify effects scale where manual remediation never will.
-
You can’t improve what you can’t measure. Put monitoring in place early.
-
Plan for continuous evolution. Azure adds features constantly, and your organization’s needs will change too.
Sources
-
Microsoft, “Cloud Adoption Framework - Management Group Design,” https://learn.microsoft.com/azure/cloud-adoption-framework/ready/landing-zone/design-area/resource-org-management-groups
-
Microsoft, “Azure Policy Effects,” https://learn.microsoft.com/azure/governance/policy/concepts/effects
-
Microsoft, “Remediate Non-Compliant Resources with Azure Policy,” https://learn.microsoft.com/azure/governance/policy/how-to/remediate-resources
-
Microsoft, “Azure Policy Compliance Data,” https://learn.microsoft.com/azure/governance/policy/how-to/get-compliance-data
-
Microsoft, “Azure Landing Zone Policies,” https://learn.microsoft.com/azure/cloud-adoption-framework/ready/landing-zone/design-area/governance
-
Microsoft, “Azure Monitor Diagnostic Settings,” https://learn.microsoft.com/azure/azure-monitor/essentials/diagnostic-settings