Application ArchitectureΒΆ
Table of ContentsΒΆ
OverviewΒΆ
ML Provisioner is a Docker-based, config-driven tool that scaffolds ML project infrastructure on AWS via CloudFormation. It generates tier-based ML project environments β CodeCommit repositories, CodeBuild projects, CodePipeline pipelines, SageMaker resources, and IAM roles β from a simple YAML configuration file.
What it is: Infrastructure scaffolding for ML projects.
What it is not: A complete ML solution. It does not provide data pipelines, trained models, notebooks, or solution-specific ML code. Those are delivered by Phase 3 ML Solution modules (future).
Position in the MLOps SuiteΒΆ
ML Provisioner is the fifth module in the Axon Tech Labs MLOps Infrastructure Suite. It sits between the security/network layer and the SageMaker environment layer:
Phase 1 β Infrastructure (Complete)
βββ VPC Provisioner β Network foundation (subnets, gateways, routing)
βββ SG Provisioner β Security groups (tier-based, cross-tier references)
βββ SEC Provisioner β IAM groups, roles, policies
βββ S3 Provisioner β ML-optimized bucket structure
βββ LB Provisioner β Load balancer provisioning (planned β ALB/NLB)
Phase 2 β ML Platform (Current)
βββ ML Provisioner β ML project scaffolding β THIS MODULE
βββ SageMaker Provisioner β Studio environment + extensions (next)
Phase 3 β ML Solutions (Future)
βββ Customer Churn Solution
βββ Fraud Detection Solution
βββ Demand Forecasting Solution
βββ (additional solutions)
Dependency Chain via SSM Parameter StoreΒΆ
Each provisioner publishes its outputs to SSM Parameter Store. Downstream provisioners read those outputs automatically β no manual wiring required.
S3 Provisioner (independent β no VPC dependency)
βββ Provisions: ML data lake bucket structure
βββ Publishes to SSM: Bucket names, folder paths
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
VPC Provisioner
βββ Provisions: VPC, subnets, routing
βββ Publishes to SSM: VPC ID, subnet IDs
β
SG Provisioner (reads VPC ID from SSM)
βββ Provisions: Security groups scoped to the VPC
βββ Publishes to SSM: Security group IDs
β
LB Provisioner (reads VPC ID, subnet IDs, SG IDs from SSM β planned)
βββ Provisions: Application/Network load balancers
βββ Publishes to SSM: Load balancer ARNs, DNS names
β
ML Provisioner (reads VPC ID, subnet IDs, SG IDs, LB outputs and S3 bucket names from SSM)
βββ Provisions: ML pipelines, SageMaker registry, VPC endpoints
βββ Publishes to SSM: Model registry ARN, KMS key ARN, endpoint IDs
β
SageMaker Provisioner (reads ML outputs and S3 bucket names from SSM β next module)
βββ Provisions: Studio domain, lifecycle configs
Design DecisionsΒΆ
Decision 1: Generic Templates, Parameterized Use CasesΒΆ
The use case (e.g., fraud-detection, customer-churn) is a naming parameter, not a separate template. All tiers use generic CloudFormation templates. The use case name flows through to resource naming only.
Rationale: Avoids false promise of delivering a complete ML solution. Keeps v1.0.0 scope manageable. Use-case-specific templates are deferred to Phase 3 ML Solutions.
Implementation: The tier YAML files in schemas/products/ are pure data definitions β valid YAML with no placeholders or substitution tokens. They define what resources exist and their structure. The cfn_generator.py reads the tier definition and the client config separately, then constructs the CloudFormation template programmatically by building Python dict structures and dumping to YAML.
This is the same battle-tested approach used in the SG Provisioner. It avoids the fragility of string substitution (invalid YAML before substitution, silent errors, hard to validate) and keeps the tier definitions clean and independently readable.
The client edits the config file directly β no partial overrides or merging logic. Every field is explicit.
Decision 2: Tier as Primary Template DimensionΒΆ
Three templates: starter.yaml, professional.yaml, enterprise.yaml. Tier determines which AWS resources are provisioned.
Rationale: Infrastructure complexity scales with tier, not use case. A fraud-detection starter has the same infrastructure as a churn starter β only the names differ.
Decision 3: CodeCommit as Primary, S3 as FallbackΒΆ
Source control is configurable via source_control: codecommit or source_control: s3.
Rationale: CodeCommit returned to GA in November 2025 with AWS investment (Git LFS in Q1 2026, regional expansion Q3 2026). S3 fallback provided for organizations with CodeCommit restrictions or compliance requirements.
Decision 4: 12 Actions β Same Pattern as SG ProvisionerΒΆ
Consistent CLI interface across all provisioners. Developers who know one provisioner know all of them.
Decision 5: SSM Parameter Store OutputsΒΆ
All deployed resource IDs stored in SSM Parameter Store for downstream consumers (SageMaker Provisioner, CI/CD pipelines, other automation).
Decision 6: Separate CI/CD Artifacts Bucket from ML Data LakeΒΆ
The ML Provisioner (Professional/Enterprise tier) creates a dedicated CI/CD artifacts bucket separate from the S3 Provisioner data lake bucket.
S3 Provisioner bucket β ML data lake, scoped per tenant:
s3://{company_prefix}-{environment}-{tenant_id}-{region}/solutions/{use-case}/
Contains: ML data (raw/curated/processed), code, models, notebooks, configs. Read/written by data scientists and ML engineers. Long retention (years).
ML Provisioner bucket β CI/CD pipeline artifacts, scoped per project:
s3://{company_prefix}-{environment}-{tenant_id}-{use_case}-ml-artifacts/
Contains: CodeBuild outputs, CodePipeline stage artifacts, deployment packages. Read/written exclusively by CodeBuild and CodePipeline service roles. Short retention (30-90 days).
Rationale for separation:
Different access patterns β data scientists vs CI/CD service roles
Different lifecycle policies β years vs weeks/months
Different scope β one data lake shared across projects vs one artifacts bucket per project
Cleaner IAM β no overlap between human and pipeline permissions
Scope comparison:
S3 Provisioner Bucket |
ML Provisioner Bucket |
|
|---|---|---|
Purpose |
ML data lake |
CI/CD pipeline artifacts |
Scope |
Per tenant (shared across projects) |
Per ML project |
Structure |
130+ ML folders |
Flat, pipeline-managed |
Who writes |
Data engineers, scientists |
CodeBuild, CodePipeline |
Retention |
Years |
30-90 days |
Tier |
Always created |
Always created |
Decision 7: No Service Catalog DependencyΒΆ
Initial design considered AWS Service Catalog as distribution mechanism. Rejected due to:
API/console permission inconsistency
Narrow scope β designed for internal IT governance, not ML developer workflows
Additional complexity without proportional value
ML product templates distributed directly via S3 (consistent with docs hosting pattern).
Enterprise self-service pattern: Enterprise clients who want to wrap generated CloudFormation templates in Service Catalog for IT admin governance and self-service vending to data science teams can do so independently. The pattern for wrapping ML Provisioner generated templates in a Service Catalog product will be documented in INTEGRATION_EXAMPLES.md. This gives enterprise clients the governance model without adding Service Catalog complexity to the tool itself.
Revisit during SageMaker Provisioner: Service Catalog integration may become relevant again when designing the SageMaker Provisioner, particularly for SageMaker Projects which have native Service Catalog integration. This decision should be reviewed at that point.
Decision 8: IAM Resource Naming β Region OmittedΒΆ
IAM is a global AWS service β role and policy names are unique per AWS account, not per region. Region therefore adds no differentiation value in IAM names and is omitted to stay within the 64-character AWS::IAM::Role name limit.
IAM naming pattern:
{company_prefix}-{env}-{tenant_id}-{use_case}-{suffix}
Example β standard resource vs IAM resource:
# Standard resource (region included)
globalbank-prod-c001-us-west-2-demand-forecasting-ml-build β CodeBuild project
globalbank-prod-c001-us-west-2-demand-forecasting-ml-dashboard β CloudWatch dashboard
# IAM resource (region omitted)
globalbank-prod-c001-demand-forecasting-ml-codebuild-role β IAM role
globalbank-prod-c001-demand-forecasting-ml-build-policy β IAM managed policy
All other resources retain the full standard pattern including region. This is the minimum deviation required to satisfy the AWS hard limit while preserving the naming conventionβs collision-free guarantees.
use_case maximum length: The validation schema enforces a 20-character maximum on ml_product.use_case. This is derived from AWS::IAM::Role being the tightest naming constraint β with typical config values, a use case longer than 20 characters would cause IAM role names to exceed the 64-character limit.
Decision 9: EventBridge Rule β CodePipeline (Direct Invocation)ΒΆ
The model approval automation uses a direct two-stage event-driven chain:
EventBridge Event Bus
β EventBridge Rule (filters for SageMaker model approval events)
β CodePipeline Deploy Pipeline (triggered directly as Rule target)
Why this architecture:
Direct invocation β EventBridge Rules natively support CodePipeline as a target. No intermediate resource is needed.
Fewer resources β eliminates
AWS::Pipes::Pipeand thepipe-execution-roleIAM role, reducing the resource count by 2 per stack.Simpler security β the existing
codepipeline-roleis reused as the Rule target role. No additional IAM role required.Lower latency β direct invocation removes an intermediate step.
No maintenance overhead β no Pipe resource to monitor, update, or troubleshoot.
CloudFormation implementation:
The Ruleβs Targets property references the deploy pipeline ARN constructed via Fn::Sub:
Targets:
- Id: DeployPipelineTarget
Arn: !Sub "arn:aws:codepipeline:${AWS::Region}:${AWS::AccountId}:{deploy-pipeline-name}"
RoleArn: !GetAtt CodepipelineRole.Arn
Note: EventBridge Pipes (
AWS::Pipes::Pipe) was evaluated and rejected. Despite architectural appeal, Pipes does not support CodePipeline as a target in itsPipeTargetParametersschema. The direct Rule β CodePipeline pattern is both simpler and fully supported.
Event-driven automation flow:
Data Scientist
βββ Approves model in SageMaker Model Registry
β
EventBridge Event Bus
βββ Receives: ModelPackageGroupStateChange event
β
EventBridge Rule
βββ Filters: status = Approved
βββ Target: Deploy Pipeline (direct invocation)
β
CodePipeline β Deploy Pipeline
βββ Pulls approved model artifact from Model Registry
βββ Runs deployment stages
βββ Deploys model to SageMaker endpoint
Decision 10: License Per AWS Account, No Template Sharing MechanismΒΆ
On generated templates: Generated CloudFormation templates in templates/ are plain YAML files with no embedded license. Sharing them is the MLOps engineerβs responsibility and cannot be blocked β this is consistent with all IaC tools (Terraform, CDK, etc.). What is licensed is the tool itself β the Docker image that generates templates, validates config, deploys, checks drift, and generates reports.
No template sharing mechanism will be built into the tool. Rationale:
Outside the toolβs scope
Every company has different sharing mechanisms (S3, Git, Confluence, etc.)
Adds complexity without licensing value
INTEGRATION_EXAMPLES.md will document the pattern for sharing templates via S3 if needed
Decision 11: IAM Policy β CodeCommit Resource ScopingΒΆ
The CodeCommitManagement statement in the generated IAM policy restricts all CodeCommit actions to repositories whose names begin with {ml_name}- for a specific AWS account:
"Resource": "arn:aws:codecommit:{region}:{account}:{ml_name}-*"
Since ml_name encodes the full project identity ({company_prefix}-{env}-{tenant_id}-{region}-{use_case}[-{workload}]-ml), a user holding this policy can only manage repositories belonging to that one project. Two different projects produce two non-overlapping ml_name values and therefore two non-overlapping resource scopes β Principle of Least Privilege in action.
Note: CodeCommit ARNs do not use a path separator before the repository name (unlike some other services). The pattern arn:aws:codecommit:{region}:{account}:{ml_name}-* is correct β no leading slash before {ml_name}.
Product Tier SystemΒΆ
Starter TierΒΆ
Foundation MLOps platform. Suitable for small teams and proof-of-concept projects.
13 β CodeCommit:
AWS::SageMaker::ModelPackageGroupβ Model Registry, approval gate, version management and traceabilityAWS::CodeCommit::Repository(x2) β model-build and model-deploy reposAWS::CodeBuild::Project(x2) β build and deploy projectsAWS::S3::Bucketβ CodePipeline artifact store with S3 Versioning enabledAWS::CodePipeline::Pipeline(x2) β build and deploy pipelinesAWS::IAM::Role(x3) β CodeBuild, CodePipeline, SageMaker execution rolesAWS::SSM::Parameter(x2) β ModelPackageGroupArn, RepositoryUrl
10 β S3 (3 resources removed vs CodeCommit):
AWS::CodeCommit::Repository(x2) β not createdAWS::SSM::Parameterreduced to (x1) β RepositoryUrl not published
Use case: Small ML team, single use case, standard security.
Professional TierΒΆ
Starter plus enhanced monitoring, event-driven automation, and additional policies.
19 β CodeCommit (all Starter resources for CodeCommit scenario plus):
AWS::Events::Ruleβ EventBridge rule triggering Deploy pipeline on model approvalAWS::CloudWatch::Dashboardβ ML pipeline monitoring dashboardAWS::IAM::ManagedPolicy(x2) β custom policies for enhanced access controlAWS::SSM::Parameter(x2) β BucketName, DashboardName (total x4 with Starter)
16 β S3 (3 resources removed vs CodeCommit):
AWS::CodeCommit::Repository(x2) β not createdAWS::SSM::Parameterreduced β RepositoryUrl not published (total x3 with Starter)
Use case: Growing ML team, multiple use cases, enhanced security and monitoring.
Enterprise TierΒΆ
Professional plus VPC integration, KMS encryption, compliance monitoring, and permission boundaries.
Scenario Counts:
Source Control |
VPC Mode |
CFN Resources |
SSM Parameters |
|---|---|---|---|
CodeCommit |
standalone |
41 |
11 |
CodeCommit |
sgprov |
39 |
10 |
S3 |
standalone |
38 |
10 |
S3 |
sgprov |
36 |
9 |
41 β CodeCommit + standalone:
AWS::SageMaker::ModelPackageGroupβ Model RegistryAWS::CodeCommit::Repository(x2) β model-build and model-deploy reposAWS::CodeBuild::Project(x2) β build and deploy projectsAWS::S3::Bucketβ CodePipeline artifact storeAWS::CodePipeline::Pipeline(x2) β build and deploy pipelinesAWS::IAM::Role(x3) β CodeBuild, CodePipeline, SageMaker execution rolesAWS::IAM::ManagedPolicy(x3) β build policy, deploy policy, permission boundaryAWS::Events::Ruleβ EventBridge rule triggering Deploy pipeline on model approvalAWS::CloudWatch::Dashboardβ ML pipeline monitoring dashboardAWS::CloudWatch::Alarm(x2) β root account usage, unauthorized API callsAWS::Logs::LogGroupβ security compliance log groupAWS::Logs::MetricFilter(x2) β security alarm filtersAWS::SNS::Topicβ security alerts topicAWS::SNS::Subscriptionβ alert email subscriptionAWS::KMS::Keyβ encryption key for ML artifactsAWS::KMS::Aliasβ key aliasAWS::EC2::VPCEndpoint(x4) β SageMaker API, SageMaker Runtime, S3 (Gateway), STSAWS::EC2::SecurityGroupβ dedicated SG for VPC endpoint trafficAWS::SSM::Parameter(x11) β ModelPackageGroupArn, RepositoryUrl, BucketName, DashboardName, KmsKeyArn, LogGroupName, VpcEndpointIdSagemakerApi, VpcEndpointIdSagemakerRuntime, VpcEndpointIdS3, VpcEndpointIdSts, SecurityGroupId
39 β CodeCommit + sgprov (2 resources removed vs CodeCommit + standalone):
AWS::EC2::SecurityGroupβ not created (managed by SG Provisioner)AWS::SSM::Parameterreduced to (x10) β SecurityGroupId not published
38 β S3 + standalone (3 resources removed vs CodeCommit + standalone):
AWS::CodeCommit::Repository(x2) β not createdAWS::SSM::Parameterreduced to (x10) β RepositoryUrl not published
36 β S3 + sgprov (5 resources removed vs CodeCommit + standalone):
AWS::CodeCommit::Repository(x2) β not createdAWS::EC2::SecurityGroupβ not created (managed by SG Provisioner)AWS::SSM::Parameterreduced to (x9) β RepositoryUrl and SecurityGroupId not published
Use case: Enterprise ML organization, strict security and compliance requirements, VPC-integrated workloads.
VPC Integration Modes (Enterprise Tier)ΒΆ
Enterprise tier supports two VPC integration modes configured via vpc_integration.mode in the YAML config:
Standalone mode β client has ML Provisioner only:
ml_product:
tier: enterprise
vpc_integration:
mode: standalone
vpc_source: parameter-store
vpc_parameter_store_path: /vpc/globalbank-prod-c001-us-west-2-vpc/VPCId
subnet_parameter_store_path: /vpc/globalbank-prod-c001-us-west-2-vpc/PrivateSubnetIds
Creates a dedicated AWS::EC2::SecurityGroup for VPC endpoint traffic plus all 4 VPC endpoints.
SG Provisioner mode β client has both ML Provisioner and SG Provisioner (or a bundle):
ml_product:
tier: enterprise
vpc_integration:
mode: sg-provisioner
vpc_source: parameter-store
vpc_parameter_store_path: /vpc/globalbank-prod-c001-us-west-2-vpc/VPCId
subnet_parameter_store_path: /vpc/globalbank-prod-c001-us-west-2-vpc/PrivateSubnetIds
sg_parameter_store_path: /sg/globalbank-prod-c001-us-west-2-sg/app/SecurityGroupId
Reads the existing SG ID from SSM Parameter Store. Creates only the 4 VPC endpoints β no new security group created, no conflict with SG Provisioner.
Note: A future bundle combining ML Provisioner and SG Provisioner will be offered. The bundle discount reflects the tighter integration between the two provisioners in enterprise deployments.
Note on S3 Gateway endpoint route table associations: The
route_table_parameter_store_path(parameter-store mode) androute_table_ids(direct mode) fields are optional. When left empty, the S3 Gateway VPC endpoint is created without explicit route table associations β the networking team is responsible for associating the endpoint with the appropriate route tables. When populated, the generator includesRouteTableIdsin the endpoint resource and associations are configured automatically at deploy time.
Configuration SystemΒΆ
YAML Configuration FileΒΆ
The configuration file is a complete, self-contained YAML file. A client edits it directly to enforce their own settings β no partial overrides or merging logic. Every field is explicit and visible.
client:
company_name: Global Bank
company_prefix: globalbank
account_id: "123456789012"
tenant_id: "c001"
environment:
env: prod
region: us-west-2
ml_product:
use_case: fraud-detection # naming parameter only β not a solution
tier: professional # starter | professional | enterprise
source_control: codecommit # codecommit | s3
product_name_override: "" # optional override for auto-generated name
workload: "" # optional discriminator for multiple products
tags:
cost_center: Fraud Operations
project: Real-time Credit Card Fraud Detection System
owner: fraud-ml-engineering-team
Product Naming ConventionΒΆ
Format |
Pattern |
Example |
|---|---|---|
Without workload |
|
|
With workload |
|
|
Multi-* SystemΒΆ
The ML Provisioner is designed from the ground up as a multi-* system. Every dimension of variation is encoded in the configuration file and flows through to resource naming automatically. No special multi-* logic is needed in the tool.
The five dimensions:
Multi-Company (subsidiaries with own company prefix)
globalbank-prod-c001-us-west-2-fraud-detection-ml
globalbank-europe-prod-c001-eu-west-1-fraud-detection-ml
globalbank-asia-prod-c001-ap-southeast-1-fraud-detection-ml
Multi-Tenant (multiple tenants within same AWS account)
globalbank-prod-c001-us-west-2-fraud-detection-ml β tenant c001
globalbank-prod-c002-us-west-2-fraud-detection-ml β tenant c002
globalbank-prod-c003-us-west-2-fraud-detection-ml β tenant c003
Multi-Environment (dev, staging, prod)
globalbank-dev-c001-us-west-2-fraud-detection-ml
globalbank-staging-c001-us-west-2-fraud-detection-ml
globalbank-prod-c001-us-west-2-fraud-detection-ml
Multi-Region
globalbank-prod-c001-us-west-2-fraud-detection-ml
globalbank-prod-c001-us-east-1-fraud-detection-ml
globalbank-prod-c001-eu-west-1-fraud-detection-ml
Multi-Use-Case (within same tenant/env/region)
globalbank-prod-c001-us-west-2-fraud-detection-ml
globalbank-prod-c001-us-west-2-customer-churn-ml
globalbank-prod-c001-us-west-2-demand-forecasting-ml
All five dimensions are handled by the same tool, same config pattern, same 12 commands. Each combination produces a completely isolated CloudFormation stack with its own resources and SSM Parameter Store paths.
The workload discriminator is the key differentiator that allows a client to deploy multiple distinct ML solutions within the exact same company/tenant/env/region combination without any naming collision. Without it, only one stack per use-case per environment is possible. With it, a client can create as many isolated variations as needed:
# Without workload β only one allowed per combination
globalbank-prod-c001-us-west-2-fraud-detection-ml
# With workload β unlimited isolated variations
globalbank-prod-c001-us-west-2-fraud-detection-realtime-ml
globalbank-prod-c001-us-west-2-fraud-detection-batch-ml
globalbank-prod-c001-us-west-2-fraud-detection-cards-ml
globalbank-prod-c001-us-west-2-fraud-detection-loans-ml
Each workload gets its own completely isolated CloudFormation stack, CodeCommit repos, pipelines, artifacts bucket, and SSM Parameter Store paths. Same company, same tenant, same environment, same region, same use case β but four independent ML scaffolding environments for different fraud detection workloads.
Config fields driving each dimension:
Dimension |
Config Field |
|---|---|
Company |
|
Tenant |
|
Environment |
|
Region |
|
Use Case |
|
Workload |
|
This is one of the strongest differentiators of the Axon Tech Labs MLOps Suite β a single tool handles the full complexity of a large enterprise with subsidiaries, multiple teams, multiple environments, and multiple regions, all from simple YAML configuration files.
Multi-environment deployments are handled by creating separate configuration files per environment. No special multi-environment logic is needed in the tool β isolation is automatic through resource naming.
Configuration files per environment:
configs/
βββ globalbank-dev-c001-us-west-2-fraud-detection-ml.yaml
βββ globalbank-staging-c001-us-west-2-fraud-detection-ml.yaml
βββ globalbank-prod-c001-us-west-2-fraud-detection-ml.yaml
Each config sets environment.env to dev, staging, or prod respectively. The generator produces three completely isolated CloudFormation stacks:
globalbank-dev-c001-us-west-2-fraud-detection-ml-stack
globalbank-staging-c001-us-west-2-fraud-detection-ml-stack
globalbank-prod-c001-us-west-2-fraud-detection-ml-stack
Each stack has its own isolated resources β CodeCommit repos, CodeBuild projects, CodePipeline pipelines, S3 artifacts bucket β and its own SSM Parameter Store paths under /ml/globalbank-{env}-c001-.../.
Tiers can differ per environment β a common and recommended pattern:
Environment |
Tier |
Rationale |
|---|---|---|
dev |
starter |
Cheap, fast iteration, no compliance overhead |
staging |
professional |
Mirrors prod, event-driven approval gate, monitoring |
prod |
enterprise |
VPC-only, KMS encryption, compliance logging, permission boundaries |
This pattern creates a natural upgrade path and keeps costs low in non-production environments while maintaining full enterprise controls in production.
Note on licensing: The ML Provisioner license is enforced per AWS account via AWS License Manager. Each AWS account requires its own Marketplace subscription. Clients running all environments in a single AWS account need only one subscription. Clients following the recommended account-per-environment isolation pattern will need one subscription per account β this is consistent with standard AWS Marketplace licensing across all IaC tools.
Each environment is deployed independently:
# Deploy dev (starter tier)
docker run --rm \
-v ~/.aws:/home/mluser/.aws:ro \
-v $(pwd)/ml/configs:/app/configs:ro \
-v $(pwd)/ml/templates:/app/templates \
-v $(pwd)/ml/reports:/app/reports \
-e AWS_PROFILE=dev-profile \
ml-provisioner:starter \
-con globalbank-dev-c001-us-west-2-fraud-detection-ml.yaml \
-act deploy-product --force
# Deploy staging (professional tier)
docker run --rm \
-v ~/.aws:/home/mluser/.aws:ro \
-v $(pwd)/ml/configs:/app/configs:ro \
-v $(pwd)/ml/templates:/app/templates \
-v $(pwd)/ml/reports:/app/reports \
-e AWS_PROFILE=staging-profile \
ml-provisioner:professional \
-con globalbank-staging-c001-us-west-2-fraud-detection-ml.yaml \
-act deploy-product --force
# Deploy prod (enterprise tier)
docker run --rm \
-v ~/.aws:/home/mluser/.aws:ro \
-v $(pwd)/ml/configs:/app/configs:ro \
-v $(pwd)/ml/templates:/app/templates \
-v $(pwd)/ml/reports:/app/reports \
-e AWS_PROFILE=prod-profile \
ml-provisioner:enterprise \
-con globalbank-prod-c001-us-west-2-fraud-detection-ml.yaml \
-act deploy-product --force
See VPC Integration Modes in the Enterprise Tier section above. The vpc_integration block in the config replaces the simple vpc_source field and supports both standalone and SG Provisioner integration modes.
CloudFormation GenerationΒΆ
The cfn_generator.py module generates CloudFormation templates from two inputs β the selected tier blueprint and the client YAML configuration. The blueprint defines structure. The client config provides identity. They meet only inside the generator β no string substitution, no placeholders.
Generation FlowΒΆ
YAML Config
β
ConfigLoader (parses YAML, resolves paths)
β
ConfigValidator (validates against tier JSON schema)
β
ProductLoader (loads tier blueprint)
β
ProductValidator (security + schema checks)
β
CfnGenerator (constructs CFN as Python dicts, dumps to YAML)
β
CloudFormation Template (saved to templates/)
Security ValidationΒΆ
The ProductValidator runs before template generation and blocks or warns on dangerous patterns. Checks are scoped strictly to resources provisioned by ML Provisioner.
Blocking Checks (generation halted)ΒΆ
IAM:
IAM roles with
*resource and no condition β enforces Principle of Least PrivilegeInline IAM policies β blocked in favor of managed policies for better versioning and reusability
Hardcoded credentials in generated template β scans for plaintext AWS access key patterns (
AKIA...) before saving
Storage:
Public S3 buckets (enterprise tier) β blocks
PublicAccessBlockConfigurationdisabledMissing KMS encryption (enterprise tier) β Customer Managed Keys required for auditability and control
Compute:
CodeBuild projects with privileged mode enabled without justification β prevents root-level access to host Docker daemon, a common privilege escalation vector
Networking:
SSH (port 22) or RDP (port 3389) open to 0.0.0.0/0 on endpoint SecurityGroup (enterprise standalone mode)
Logging:
CloudWatch LogGroup retention below 90 days (enterprise tier) β enforces minimum retention for audit and incident response
Warning Checks (generation proceeds with warning)ΒΆ
IAM:
Roles containing high-risk actions:
iam:PassRole,iam:CreateAccessKey,s3:DeleteBucketβ warned rather than blocked becauseiam:PassRoleis legitimately required for SageMaker execution roles. Warning includes justification guidance
Tags:
Missing required tags on any taggable resource β essential for ABAC (Attribute-Based Access Control) and governance
Out of ScopeΒΆ
VPC Flow Logs β VPC Provisioner responsibility. ML Provisioner does not create VPCs
Security Groups for application tiers β SG Provisioner responsibility
RDS PubliclyAccessible β ML Provisioner does not provision RDS
Load Balancer / CloudFront HTTPS enforcement β ML Provisioner does not provision these resources
EC2 public IP assignment β ML Provisioner does not provision EC2 instances
S3 account-level public access blocks β ML Provisioner does not modify account-level settings
For the full technical reference including blueprint schema, generation algorithm, client data injection, naming conventions, conditional generation logic, and concrete examples see CFN_GENERATOR.md.
SSM Parameter Store IntegrationΒΆ
All deployed resource identifiers are stored in SSM Parameter Store at deployment time under the path /ml/{product-name}/, where {product-name} is derived from the configuration as:
{company_prefix}-{env}-{tenant_id}-{region}-{use_case}-ml
Example (globalbank enterprise deployment):
/ml/globalbank-prod-c001-us-west-2-demand-forecasting-ml/ModelPackageGroupArn
/ml/globalbank-prod-c001-us-west-2-demand-forecasting-ml/RepositoryUrl
/ml/globalbank-prod-c001-us-west-2-demand-forecasting-ml/BucketName
...
Full parameter list by tier:
/ml/{product-name}/ModelPackageGroupArn (all tiers)
/ml/{product-name}/RepositoryUrl (codecommit only)
/ml/{product-name}/BucketName (professional + enterprise)
/ml/{product-name}/DashboardName (professional + enterprise)
/ml/{product-name}/KmsKeyArn (enterprise only)
/ml/{product-name}/LogGroupName (enterprise only)
/ml/{product-name}/SecurityGroupId (enterprise standalone mode only)
/ml/{product-name}/VpcEndpointIdSagemakerApi (enterprise only)
/ml/{product-name}/VpcEndpointIdSagemakerRuntime (enterprise only)
/ml/{product-name}/VpcEndpointIdS3 (enterprise only)
/ml/{product-name}/VpcEndpointIdSts (enterprise only)
These paths are available for consumption by downstream tooling β such as a SageMaker Provisioner β to configure Studio domains and projects without manual cross-referencing.
Actions ReferenceΒΆ
All actions require AWS credentials for subscription validation. Actions marked Mutating additionally require --force.
Action |
AWS Calls |
βforce |
Purpose |
|---|---|---|---|
|
subscription only |
β |
Validate YAML schema and field values |
|
subscription only |
β |
List available tier templates |
|
subscription only |
β |
Display tier resources and configuration |
|
subscription only |
β |
Generate least-privilege IAM policy |
|
subscription only |
β |
Generate CloudFormation template |
|
subscription only |
β |
Validate template locally |
|
subscription only |
β |
Generate pre-deployment HTML report |
|
read-only |
β |
Preview changes against deployed stack |
|
read-only |
β |
Detect infrastructure drift |
|
read-only |
β |
Deploy with isolated suffix for testing |
|
mutating |
β |
Deploy ML product infrastructure |
|
mutating |
β |
Delete stack and all resources |
Source TreeΒΆ
packages/ml-provisioner-tool/
βββ configs
β βββ edge-prod-b001-us-west-2-fraud-detection-ml-codecommit-workload.yaml
β βββ edge-prod-b001-us-west-2-fraud-detection-ml-codecommit.yaml
β βββ edge-prod-b001-us-west-2-fraud-detection-ml-s3-workload.yaml
β βββ edge-prod-b001-us-west-2-fraud-detection-ml-s3.yaml
β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-sgprov-direct.yaml
β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-sgprov-ssm.yaml
β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-standalone-direct-rtb.yaml
β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-standalone-direct.yaml
β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-standalone-ssm-workload.yaml
β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-standalone-ssm.yaml
β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-sgprov-direct.yaml
β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-sgprov-ssm.yaml
β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-standalone-direct-rtb.yaml
β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-standalone-direct.yaml
β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-standalone-ssm-workload.yaml
β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-standalone-ssm.yaml
β βββ techcorp-prod-a001-us-west-2-customer-churn-ml-codecommit-workload.yaml
β βββ techcorp-prod-a001-us-west-2-customer-churn-ml-codecommit.yaml
β βββ techcorp-prod-a001-us-west-2-customer-churn-ml-s3-workload.yaml
β βββ techcorp-prod-a001-us-west-2-customer-churn-ml-s3.yaml
β βββ examples
β βββ enterprise
β β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-sgprov-direct.yaml
β β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-sgprov-ssm.yaml
β β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-standalone-direct-rtb.yaml
β β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-standalone-direct.yaml
β β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-standalone-ssm-workload.yaml
β β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-standalone-ssm.yaml
β β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-sgprov-direct.yaml
β β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-sgprov-ssm.yaml
β β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-standalone-direct-rtb.yaml
β β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-standalone-direct.yaml
β β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-standalone-ssm-workload.yaml
β β βββ globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-standalone-ssm.yaml
β βββ professional
β β βββ edge-prod-b001-us-west-2-fraud-detection-ml-codecommit-workload.yaml
β β βββ edge-prod-b001-us-west-2-fraud-detection-ml-codecommit.yaml
β β βββ edge-prod-b001-us-west-2-fraud-detection-ml-s3-workload.yaml
β β βββ edge-prod-b001-us-west-2-fraud-detection-ml-s3.yaml
β βββ starter
β βββ techcorp-prod-a001-us-west-2-customer-churn-ml-codecommit-workload.yaml
β βββ techcorp-prod-a001-us-west-2-customer-churn-ml-codecommit.yaml
β βββ techcorp-prod-a001-us-west-2-customer-churn-ml-s3-workload.yaml
β βββ techcorp-prod-a001-us-west-2-customer-churn-ml-s3.yaml
βββ docker
β βββ Dockerfile
β βββ entrypoint.sh
βββ docs
β βββ sphinx
β β βββ source
β β βββ conf.py
β β βββ index.rst
β β βββ onboarding
β βββ APPLICATION_ARCHITECTURE.md
β βββ CFN_GENERATOR.md # internal
β βββ CFN_GENERATOR_IMPLEMENTATION_STEPS.md # internal
β βββ CONFIGURATION.md
β βββ CONFIGURATION_GUIDE.md
β βββ FEEDBACK.md
β βββ IAM_PERMISSIONS.md
β βββ INTEGRATION_EXAMPLES.md
β βββ MIGRATION_GUIDE.md
β βββ NAMING_CONVENTIONS.md
β βββ PREREQUISITES.md
β βββ README.md
β βββ RELEASE_NOTES.md
β βββ RESOURCES_EXPLAINED.md
β βββ ROADMAP.md
β βββ SAMPLE_REPORTS.md
β βββ SECURITY_GUIDELINES.md
β βββ SUPPORT.md
β βββ TROUBLESHOOTING.md
β βββ UPDATE_PROCEDURES.md
β βββ USER_GUIDE.md
βββ policies
βββ reports
βββ schemas
β βββ products
β β βββ enterprise.yaml
β β βββ professional.yaml
β β βββ starter.yaml
β βββ validation-schema-enterprise.yaml
β βββ validation-schema-professional.yaml
β βββ validation-schema-starter.yaml
β βββ validation-schema.yaml
βββ src
β βββ ml_provisioner
β βββ __init__.py
β βββ __main__.py
β βββ cli.py
β βββ config
β β βββ __init__.py
β β βββ app_config.yaml
β β βββ loader.py
β βββ core
β β βββ __init__.py
β β βββ ml_manager.py
β βββ generators
β β βββ __init__.py
β β βββ cfn_generator.py
β βββ license
β β βββ __init__.py
β β βββ validator.py
β βββ models
β β βββ __init__.py
β β βββ product.py
β βββ products
β β βββ __init__.py
β β βββ loader.py
β β βββ validator.py
β βββ utils
β βββ __init__.py
β βββ html_generator.py
β βββ review_report.py
βββ templates
βββ tests
βββ LICENSE.txt
βββ README.MD
βββ Makefile
βββ pyproject.toml
βββ setup.py
βββ uv.lock
Future RoadmapΒΆ
See Roadmap for the full roadmap including planned features and deferred enhancements.