Governance, Compliance, and Audit CapabilitiesΒΆ
Table of ContentsΒΆ
OverviewΒΆ
This document describes how the S3 folder structure provisioned by s3-provisioner-tool supports enterprise-grade governance, compliance, and audit capabilities.
Important: The s3-provisioner tool creates the folder structure only. Actual governance, compliance, and audit capabilities must be implemented by:
AWS services (CloudTrail, Config, S3 Access Logs, Macie, etc.)
Your MLOps applications that write to these folders
Third-party governance tools
This document serves as a reference architecture showing how to use the provisioned structure for enterprise requirements.
What the s3-provisioner Tool ProvidesΒΆ
The s3-provisioner tool creates:
β Folder structure with designated locations for:
Audit logs (
artifacts/metadata/governance/audit_logs/)Compliance reports (
artifacts/metadata/governance/compliance/)Data lineage metadata (
artifacts/metadata/governance/lineage/)Data quality reports (
artifacts/reports/data_quality/)
β S3 bucket configuration:
Versioning (optional)
Encryption at rest
Public access blocking
VPC endpoint support
Lifecycle policies
Tagging for governance
β CloudFormation templates for reproducible infrastructure
What You Need to ImplementΒΆ
β Data governance logic - Your applications must write metadata to governance folders
β Compliance monitoring - Configure AWS Config, CloudTrail, or third-party tools
β Audit trail generation - Your MLOps pipelines must log events to audit folders
β Access control enforcement - Configure IAM policies, bucket policies, SCPs
β Data lineage tracking - Your pipelines must write lineage metadata
Data GovernanceΒΆ
Metadata ManagementΒΆ
Location: artifacts/metadata/governance/
governance/
βββ data_catalog/
β βββ schema_registry.json
β βββ data_dictionary.json
β βββ lineage_metadata.json
βββ policies/
β βββ data_classification.json
β βββ access_policies.json
β βββ retention_policies.json
βββ audit_logs/
βββ YYYY/MM/DD/
βββ governance_events.jsonl
Key Capabilities:
Centralized data catalog with schema versioning
Data classification (PII, sensitive, public)
Automated policy enforcement
Change tracking and approval workflows
Data Quality StandardsΒΆ
Location: artifacts/reports/data_quality/
{
"report_id": "dq_2024_01_01_001",
"timestamp": "2024-01-01T10:00:00Z",
"dataset": "raw/customer_data",
"checks": {
"completeness": {"score": 0.98, "threshold": 0.95, "status": "PASS"},
"accuracy": {"score": 0.96, "threshold": 0.90, "status": "PASS"},
"consistency": {"score": 0.99, "threshold": 0.95, "status": "PASS"},
"timeliness": {"score": 0.94, "threshold": 0.90, "status": "PASS"}
},
"violations": [],
"approved_by": "<user_id>",
"compliance_status": "COMPLIANT"
}
Compliance FrameworkΒΆ
Regulatory ComplianceΒΆ
Supports multiple compliance standards:
Standard |
Scope |
Implementation |
|---|---|---|
GDPR |
Data privacy, right to deletion |
Encryption, access logs, data retention |
HIPAA |
Healthcare data protection |
Encryption at rest/transit, audit trails |
SOC 2 |
Security and availability |
Access controls, monitoring, incident response |
ISO 27001 |
Information security |
Risk management, security policies |
CCPA |
California consumer privacy |
Data inventory, consent management |
Compliance ArtifactsΒΆ
Location: artifacts/metadata/governance/compliance/
compliance/
βββ certifications/
β βββ soc2_report.pdf
β βββ iso27001_certificate.pdf
βββ assessments/
β βββ YYYY/MM/
β βββ compliance_assessment.json
βββ evidence/
β βββ YYYY/MM/DD/
β βββ access_reviews.json
β βββ encryption_verification.json
β βββ backup_verification.json
βββ violations/
βββ YYYY/MM/DD/
βββ violation_reports.json
Audit CapabilitiesΒΆ
Comprehensive Audit TrailΒΆ
All operations are logged with full traceability:
{
"event_id": "evt_20240101_100000_001",
"timestamp": "2024-01-01T10:00:00Z",
"event_type": "DATA_ACCESS",
"actor": {
"user_id": "<user_id>",
"role": "DATA_SCIENTIST",
"session_id": "<session_id>"
},
"resource": {
"type": "S3_OBJECT",
"path": "s3://bucket/solutions/master-solution/data/raw/2024/01/01/data.csv",
"classification": "CONFIDENTIAL"
},
"action": "READ",
"result": "SUCCESS",
"metadata": {
"ip_address": "<ip>",
"user_agent": "boto3/1.26.0",
"request_id": "<request_id>"
},
"compliance_context": {
"purpose": "MODEL_TRAINING",
"approval_id": "appr_001",
"data_usage_agreement": "dua_2024_001"
}
}
Audit Log LocationsΒΆ
artifacts/metadata/governance/audit_logs/
βββ access_logs/
β βββ YYYY/MM/DD/
β βββ access_events.jsonl
βββ change_logs/
β βββ YYYY/MM/DD/
β βββ change_events.jsonl
βββ deployment_logs/
β βββ YYYY/MM/DD/
β βββ deployment_events.jsonl
βββ compliance_logs/
βββ YYYY/MM/DD/
βββ compliance_events.jsonl
Audit Query ExamplesΒΆ
Who accessed sensitive data?
SELECT actor.user_id, resource.path, timestamp
FROM audit_logs
WHERE resource.classification = 'CONFIDENTIAL'
AND action = 'READ'
AND date >= '2024-01-01'
Track model lineage:
SELECT model_id, training_data_path, training_timestamp, deployed_by
FROM deployment_logs
WHERE model_id = 'model_v1.0.0'
Access ControlΒΆ
Role-Based Access Control (RBAC)ΒΆ
Location: config/environment_configs/rbac/
{
"roles": {
"DATA_SCIENTIST": {
"permissions": [
"s3:GetObject:data/processed/*",
"s3:PutObject:notebooks/*",
"s3:GetObject:models/experiments/*"
],
"restrictions": {
"no_access": ["data/raw/*", "config/*"],
"read_only": ["models/registry/production/*"]
}
},
"ML_ENGINEER": {
"permissions": [
"s3:*:code/*",
"s3:*:models/training/*",
"s3:PutObject:models/registry/staging/*"
]
},
"ADMIN": {
"permissions": ["s3:*:*"]
}
}
}
Least Privilege PrincipleΒΆ
Default deny all access
Explicit grants per role
Time-bound access tokens
Just-in-time privilege elevation
Data LineageΒΆ
Lineage TrackingΒΆ
Location: artifacts/metadata/governance/lineage/
{
"lineage_id": "lin_20240101_001",
"entity_type": "MODEL",
"entity_id": "model_v1.0.0",
"lineage_graph": {
"nodes": [
{
"id": "raw_data_001",
"type": "DATASET",
"path": "s3://bucket/data/raw/2024/01/01/",
"timestamp": "2024-01-01T08:00:00Z"
},
{
"id": "processed_data_001",
"type": "DATASET",
"path": "s3://bucket/data/processed/train/",
"timestamp": "2024-01-01T09:00:00Z"
},
{
"id": "model_v1.0.0",
"type": "MODEL",
"path": "s3://bucket/models/registry/production/model_v1.0.0/",
"timestamp": "2024-01-01T12:00:00Z"
}
],
"edges": [
{"from": "raw_data_001", "to": "processed_data_001", "transform": "preprocessing_v1"},
{"from": "processed_data_001", "to": "model_v1.0.0", "transform": "training_job_001"}
]
},
"compliance_metadata": {
"data_sources_approved": true,
"transformations_validated": true,
"model_approved_for_production": true
}
}
Lineage VisualizationΒΆ
End-to-end data flow tracking
Transformation history
Impact analysis for changes
Reproducibility guarantees
Retention PoliciesΒΆ
Data Lifecycle ManagementΒΆ
Location: config/environment_configs/retention_policies.json
{
"policies": [
{
"name": "raw_data_retention",
"path_pattern": "data/raw/**",
"retention_days": 90,
"transition_rules": [
{"days": 30, "storage_class": "STANDARD_IA"},
{"days": 60, "storage_class": "GLACIER"}
],
"delete_after_days": 90
},
{
"name": "model_artifacts_retention",
"path_pattern": "models/experiments/**",
"retention_days": 365,
"transition_rules": [
{"days": 90, "storage_class": "STANDARD_IA"}
]
},
{
"name": "audit_logs_retention",
"path_pattern": "artifacts/metadata/governance/audit_logs/**",
"retention_days": 2555,
"compliance_requirement": "SOC2_7_YEARS",
"immutable": true
},
{
"name": "production_models_retention",
"path_pattern": "models/registry/production/**",
"retention_days": 1825,
"compliance_requirement": "REGULATORY_5_YEARS",
"immutable": true
}
]
}
Archive StrategyΒΆ
data/raw/archive/
βββ YYYY/
βββ MM/
βββ archived_data_YYYYMMDD.tar.gz
Monitoring and AlertingΒΆ
Compliance MonitoringΒΆ
Location: artifacts/reports/monitoring/compliance/
Real-time Alerts:
Unauthorized access attempts
Policy violations
Data quality threshold breaches
Encryption failures
Unusual data access patterns
Alert Configuration:
{
"alerts": [
{
"name": "unauthorized_access",
"condition": "event_type == 'DATA_ACCESS' AND result == 'DENIED'",
"severity": "HIGH",
"notification_channels": ["security_team", "compliance_officer"]
},
{
"name": "pii_access_without_approval",
"condition": "resource.classification == 'PII' AND compliance_context.approval_id IS NULL",
"severity": "CRITICAL",
"notification_channels": ["security_team", "legal_team"]
},
{
"name": "data_quality_failure",
"condition": "data_quality.score < threshold AND dataset.classification == 'PRODUCTION'",
"severity": "MEDIUM",
"notification_channels": ["data_engineering_team"]
}
]
}
Compliance Dashboard MetricsΒΆ
Access control effectiveness
Policy compliance rate
Audit coverage percentage
Data quality scores
Incident response times
Certification status
Implementation ChecklistΒΆ
S3 Bucket Setup (Handled by s3-provisioner)ΒΆ
Enable S3 bucket versioning (configurable)
Configure encryption (SSE-S3 by default)
Set up lifecycle policies (configurable)
Block public access
Create folder structure
Apply resource tags
AWS Service Configuration (You Must Configure)ΒΆ
Enable AWS CloudTrail for API logging
Configure S3 Access Logs
Set up AWS Config rules
Enable S3 Inventory reports
Configure S3 Object Lock for immutable data (if required)
Implement bucket policies and IAM roles
Configure cross-region replication (if required)
Set up AWS Macie for PII detection (if required)
Application Integration (You Must Implement)ΒΆ
Regular access reviews (quarterly)
Compliance assessments (annual)
Audit log reviews (monthly)
Data quality monitoring (daily)
Policy updates and approvals
Incident response drills
Security training for team members
Vendor risk assessments
Best PracticesΒΆ
Encryption Everywhere: Encrypt data at rest and in transit
Least Privilege: Grant minimum necessary permissions
Immutable Audit Logs: Use S3 Object Lock for compliance logs
Automated Compliance: Use AWS Config and custom Lambda functions
Regular Reviews: Conduct periodic access and compliance reviews
Documentation: Maintain up-to-date policies and procedures
Incident Response: Have a documented incident response plan
Training: Regular security and compliance training for all users