Introduction

We’ve spent three weeks deep in EPAC territory—learning the tool, building pipelines, and mastering advanced patterns. Now it’s time to zoom out.

A governance framework isn’t just policies. It’s the entire system: the hierarchy that organizes your Azure estate, the policies that enforce standards, the monitoring that tracks compliance, and the automation that fixes drift.

This week, we’re building that system with designs you can actually implement. Management group design drives everything else, and the approach is always the same: start with audit, automate remediation, then enforce.

Governance framework architecture

I think about governance in four layers:

┌──────────────────────────────────────────────────────────────────┐
│ Layer 4: Visibility │
│ Compliance dashboards, reporting, alerting │
└──────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────┐
│ Layer 3: Enforcement │
│ Policy assignments, deny effects, audit effects │
└──────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────┐
│ Layer 2: Remediation │
│ DeployIfNotExists, Modify effects, remediation tasks │
└──────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────┐
│ Layer 1: Structure │
│ Management groups, subscriptions, naming conventions │
└──────────────────────────────────────────────────────────────────┘

Here’s how to build each one.

Layer 1: Management group structure

Your management group hierarchy is the foundation. Get it wrong, and everything above it becomes harder.

Tenant Root Group
├── Intermediate Root (your-org-name)
│ │
│ ├── Platform
│ │ ├── Management # Log Analytics, Automation
│ │ ├── Connectivity # Hub VNets, Firewalls, DNS
│ │ └── Identity # Domain Controllers, ADDS
│ │
│ ├── Landing Zones
│ │ ├── Hybrid # Internal applications
│ │ │ ├── Hybrid-Prod
│ │ │ └── Hybrid-NonProd
│ │ └── Public # Internet-facing applications
│ │ ├── Public-Prod
│ │ └── Public-NonProd
│ │
│ ├── Sandbox # Experimentation, no connectivity
│ │
│ └── Decommissioned # Subscriptions pending deletion

Why this structure works

LevelPurposePolicy Focus
Intermediate RootOrganization-wide standardsSecurity baselines, allowed regions
PlatformShared servicesStricter controls, limited access
Landing ZonesWorkload hostingWorkload-specific policies
Corp/OnlineConnectivity typeNetwork policies, public exposure
Prod/NonProdEnvironmentDeployment gates, cost controls
SandboxExperimentationMinimal policies, cost limits only

Implementation with Terraform

# Management Group Hierarchy
resource "azurerm_management_group" "intermediate_root" {
display_name = "Contoso"
parent_management_group_id = data.azurerm_management_group.tenant_root.id
}
resource "azurerm_management_group" "platform" {
display_name = "Platform"
parent_management_group_id = azurerm_management_group.intermediate_root.id
}
resource "azurerm_management_group" "landing_zones" {
display_name = "Landing Zones"
parent_management_group_id = azurerm_management_group.intermediate_root.id
}
resource "azurerm_management_group" "corp" {
display_name = "Corp"
parent_management_group_id = azurerm_management_group.landing_zones.id
}
resource "azurerm_management_group" "online" {
display_name = "Online"
parent_management_group_id = azurerm_management_group.landing_zones.id
}
resource "azurerm_management_group" "sandbox" {
display_name = "Sandbox"
parent_management_group_id = azurerm_management_group.intermediate_root.id
}
resource "azurerm_management_group" "decommissioned" {
display_name = "Decommissioned"
parent_management_group_id = azurerm_management_group.intermediate_root.id
}

Layer 2: Policy assignment strategy

Policies should flow down the hierarchy, with more specific controls at lower levels.

Assignment hierarchy

Intermediate Root (Contoso):
# Universal security standards
- Allowed Locations (Deny)
- Require Resource Tags (Deny)
- Audit Unencrypted Storage (Audit)
- Deploy Defender for Cloud (DeployIfNotExists)
Platform:
# Strict platform controls
- Inherit all from parent
- Deny Public IP Addresses (Deny)
- Require Private Endpoints (Deny)
- Deploy Diagnostic Settings (DeployIfNotExists)
Landing Zones:
# Workload standards
- Inherit all from parent
- Audit Resources Without Tags (Audit)
- Deploy Activity Log Settings (DeployIfNotExists)
Corp:
# Internal workload controls
- Inherit all from parent
- Deny Internet Outbound without Firewall (Deny)
Online:
# Internet-facing controls
- Inherit all from parent
- Require WAF on Application Gateway (Deny)
- Audit Missing DDoS Protection (Audit)
Sandbox:
# Minimal controls
- Budget Alerts Only (DeployIfNotExists)
- Audit Expensive SKUs (Audit)
# Explicitly EXCLUDE security deny policies

Policy effect selection guide

ScenarioEffectWhen to Use
Block non-compliant deploymentsDenyWell-established standards, critical security
Track compliance without blockingAuditNew policies, understanding current state
Auto-configure resourcesDeployIfNotExistsMonitoring, logging, security agents
Modify existing propertiesModifyTags, network settings, encryption
Block modificationsDenyActionPrevent deletion of critical resources

Example: tiered enforcement

Start with Audit, move to Deny:

{
"assignment": {
"name": "require-storage-https",
"displayName": "Require HTTPS for Storage Accounts"
},
"parameters": {
"effect": "Audit" // Phase 1: Audit
},
"metadata": {
"enforcementPhase": "1-audit",
"denyDate": "2026-03-01",
"notes": "Move to Deny after 60 days of compliance monitoring"
}
}

After compliance reaches target:

{
"parameters": {
"effect": "Deny" // Phase 2: Enforce
},
"metadata": {
"enforcementPhase": "2-deny",
"auditCompletedDate": "2026-02-28",
"complianceAtEnforcement": "98.5%"
}
}

Layer 3: Remediation architecture

Policies with DeployIfNotExists and Modify effects can automatically fix non-compliant resources.

DeployIfNotExists pattern

Use for deploying additional resources (e.g., diagnostic settings):

{
"policyRule": {
"if": {
"field": "type",
"equals": "Microsoft.Storage/storageAccounts"
},
"then": {
"effect": "DeployIfNotExists",
"details": {
"type": "Microsoft.Insights/diagnosticSettings",
"name": "storage-diagnostics",
"existenceCondition": {
"allOf": [
{
"field": "Microsoft.Insights/diagnosticSettings/logs.enabled",
"equals": "true"
}
]
},
"roleDefinitionIds": [
"/providers/Microsoft.Authorization/roleDefinitions/749f88d5-cbae-40b8-bcfc-e573ddc772fa",
"/providers/Microsoft.Authorization/roleDefinitions/92aaf0da-9dab-42b6-94a3-d43ce8d16293"
],
"deployment": {
"properties": {
"mode": "incremental",
"template": {
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"storageAccountName": {
"type": "string"
},
"logAnalyticsWorkspaceId": {
"type": "string"
}
},
"resources": [
{
"type": "Microsoft.Storage/storageAccounts/providers/diagnosticSettings",
"apiVersion": "2021-05-01-preview",
"name": "[concat(parameters('storageAccountName'), '/Microsoft.Insights/storage-diagnostics')]",
"properties": {
"workspaceId": "[parameters('logAnalyticsWorkspaceId')]",
"logs": [
{
"category": "StorageRead",
"enabled": true
},
{
"category": "StorageWrite",
"enabled": true
}
],
"metrics": [
{
"category": "Transaction",
"enabled": true
}
]
}
}
]
},
"parameters": {
"storageAccountName": {
"value": "[field('name')]"
},
"logAnalyticsWorkspaceId": {
"value": "[parameters('logAnalyticsWorkspaceId')]"
}
}
}
}
}
}
}
}

Modify pattern

Use for changing properties on existing resources (e.g., tags):

{
"policyRule": {
"if": {
"allOf": [
{
"field": "type",
"equals": "Microsoft.Resources/subscriptions/resourceGroups"
},
{
"field": "tags['Environment']",
"exists": "false"
}
]
},
"then": {
"effect": "Modify",
"details": {
"roleDefinitionIds": [
"/providers/Microsoft.Authorization/roleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c"
],
"operations": [
{
"operation": "addOrReplace",
"field": "tags['Environment']",
"value": "[if(contains(field('name'), 'prod'), 'Production', 'Development')]"
}
]
}
}
}
}

Remediation task automation

Automate remediation with a scheduled pipeline:

Terminal window
# Get all non-compliant policy states for DeployIfNotExists policies
$nonCompliantStates = Get-AzPolicyState `
-ManagementGroupName "contoso" `
-Filter "ComplianceState eq 'NonCompliant' and PolicyDefinitionAction eq 'deployifnotexists'" `
-Top 1000
# Group by policy assignment
$groupedByAssignment = $nonCompliantStates | Group-Object PolicyAssignmentId
foreach ($group in $groupedByAssignment) {
$assignmentId = $group.Name
$resourceCount = $group.Count
Write-Host "Creating remediation for $assignmentId ($resourceCount resources)"
# Create remediation task
Start-AzPolicyRemediation `
-Name "auto-remediation-$(Get-Date -Format 'yyyyMMdd-HHmmss')" `
-ManagementGroupName "contoso" `
-PolicyAssignmentId $assignmentId `
-ResourceDiscoveryMode ReEvaluateCompliance
# Add delay between remediations to avoid throttling
Start-Sleep -Seconds 30
}

Layer 4: Compliance visibility

Without visibility, you’re just hoping your policies work. This layer is how you actually know.

Azure Policy compliance dashboard

Built-in compliance view shows:

  • Overall compliance percentage
  • Non-compliant resources by policy
  • Trend over time

Access at: Azure Portal → Policy → Compliance

Custom compliance reporting

Export compliance data for custom dashboards:

Terminal window
# Export compliance summary
$summary = Get-AzPolicyStateSummary -ManagementGroupName "contoso"
$report = @{
Timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
TotalResources = $summary.Results.ResourceDetails.Count
CompliantResources = ($summary.Results.ResourceDetails |
Where-Object { $_.ComplianceState -eq 'Compliant' }).Count
NonCompliantResources = ($summary.Results.ResourceDetails |
Where-Object { $_.ComplianceState -eq 'NonCompliant' }).Count
CompliancePercentage = [math]::Round(
($summary.Results.ResourceDetails |
Where-Object { $_.ComplianceState -eq 'Compliant' }).Count /
$summary.Results.ResourceDetails.Count * 100, 2
)
}
$report | ConvertTo-Json | Out-File "compliance-report-$(Get-Date -Format 'yyyyMMdd').json"

Log Analytics integration

Stream policy events to Log Analytics:

# Diagnostic setting for Azure Policy events
resource "azurerm_monitor_diagnostic_setting" "policy_events" {
name = "policy-to-log-analytics"
target_resource_id = data.azurerm_subscription.current.id
log_analytics_workspace_id = azurerm_log_analytics_workspace.governance.id
enabled_log {
category = "Policy"
}
}

Query non-compliant resources:

AzureActivity
| where CategoryValue == "Policy"
| where OperationNameValue contains "audit" or OperationNameValue contains "deny"
| summarize Count = count() by
PolicyDefinitionName = tostring(Properties_d.policies[0].policyDefinitionName),
Resource = tostring(Properties_d.resourceId)
| order by Count desc

Alerting on compliance changes

resource "azurerm_monitor_scheduled_query_rules_alert_v2" "compliance_drop" {
name = "compliance-drop-alert"
resource_group_name = azurerm_resource_group.governance.name
location = azurerm_resource_group.governance.location
evaluation_frequency = "PT1H"
window_duration = "PT1H"
scopes = [azurerm_log_analytics_workspace.governance.id]
criteria {
query = <<-QUERY
AzureActivity
| where CategoryValue == "Policy"
| where OperationNameValue == "Microsoft.Authorization/policies/audit/action"
| summarize NonCompliantCount = count() by bin(TimeGenerated, 1h)
| where NonCompliantCount > 100
QUERY
time_aggregation_method = "Count"
threshold = 0
operator = "GreaterThan"
}
action {
action_groups = [azurerm_monitor_action_group.governance_alerts.id]
}
}

Putting it all together: implementation roadmap

Phase 1: Foundation (Weeks 1-2)

Goals:

  • Establish management group hierarchy
  • Deploy baseline policies in Audit mode
  • Configure compliance monitoring

Actions:

  1. Create management group structure
  2. Deploy EPAC with DEV environment only
  3. Assign core security policies (Audit effect)
  4. Set up Log Analytics for policy events

Phase 2: Enforcement (Weeks 3-4)

Goals:

  • Move validated policies to Deny
  • Implement remediation pipelines
  • Add deployment gates

Actions:

  1. Review Phase 1 compliance data
  2. Update high-compliance policies to Deny
  3. Deploy remediation automation
  4. Create approval gates for policy changes

Phase 3: Optimization (Weeks 5-8)

Goals:

  • Full environment coverage
  • Custom policy development
  • Advanced monitoring

Actions:

  1. Promote to Non-Prod and Production
  2. Develop organization-specific policies
  3. Build compliance dashboards
  4. Implement alerting

Phase 4: Maturity (Ongoing)

Goals:

  • Continuous improvement
  • Policy lifecycle management
  • Compliance reporting

Actions:

  1. Quarterly policy reviews
  2. Exemption audits
  3. New Azure feature policy coverage
  4. Stakeholder reporting

Common governance patterns

Pattern: tag inheritance

Automatically inherit tags from resource groups:

{
"displayName": "Inherit CostCenter tag from resource group",
"policyRule": {
"if": {
"allOf": [
{
"field": "tags['CostCenter']",
"exists": "false"
},
{
"value": "[resourceGroup().tags['CostCenter']]",
"notEquals": ""
}
]
},
"then": {
"effect": "Modify",
"details": {
"roleDefinitionIds": [
"/providers/Microsoft.Authorization/roleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c"
],
"operations": [
{
"operation": "addOrReplace",
"field": "tags['CostCenter']",
"value": "[resourceGroup().tags['CostCenter']]"
}
]
}
}
}
}

Pattern: deny with exceptions

Allow specific users/groups to bypass:

{
"policyRule": {
"if": {
"allOf": [
{
"field": "type",
"equals": "Microsoft.Network/publicIPAddresses"
},
{
"value": "[if(empty(field('tags[''ApprovedBy'']')), 'false', 'true')]",
"equals": "false"
}
]
},
"then": {
"effect": "deny"
}
}
}

Pattern: environment-aware policies

Different enforcement by environment tag:

{
"policyRule": {
"if": {
"allOf": [
{
"field": "tags['Environment']",
"equals": "Production"
},
{
"field": "type",
"equals": "Microsoft.Compute/virtualMachines"
},
{
"field": "Microsoft.Compute/virtualMachines/storageProfile.osDisk.managedDisk.storageAccountType",
"notIn": ["Premium_LRS", "Premium_ZRS"]
}
]
},
"then": {
"effect": "deny"
}
}
}

Key takeaways

  1. A well-designed management group hierarchy makes everything else possible. Get that wrong and you’ll fight the structure forever.

  2. Layer your controls: Structure, then Remediation, then Enforcement, then Visibility. Each layer builds on the previous one.

  3. Start with Audit, graduate to Deny. Understand your current state before blocking deployments.

  4. DeployIfNotExists and Modify effects scale where manual remediation never will.

  5. You can’t improve what you can’t measure. Put monitoring in place early.

  6. Plan for continuous evolution. Azure adds features constantly, and your organization’s needs will change too.


Sources

  1. Microsoft, “Cloud Adoption Framework - Management Group Design,” https://learn.microsoft.com/azure/cloud-adoption-framework/ready/landing-zone/design-area/resource-org-management-groups

  2. Microsoft, “Azure Policy Effects,” https://learn.microsoft.com/azure/governance/policy/concepts/effects

  3. Microsoft, “Remediate Non-Compliant Resources with Azure Policy,” https://learn.microsoft.com/azure/governance/policy/how-to/remediate-resources

  4. Microsoft, “Azure Policy Compliance Data,” https://learn.microsoft.com/azure/governance/policy/how-to/get-compliance-data

  5. Microsoft, “Azure Landing Zone Policies,” https://learn.microsoft.com/azure/cloud-adoption-framework/ready/landing-zone/design-area/governance

  6. Microsoft, “Azure Monitor Diagnostic Settings,” https://learn.microsoft.com/azure/azure-monitor/essentials/diagnostic-settings