Application Architecture¶

Table of Contents¶

Overview
Position in the MLOps Suite
Design Decisions
Product Tier System
Configuration System
CloudFormation Generation
SSM Parameter Store Integration
Actions Reference
Source Tree
Future Roadmap

Overview¶

ML Provisioner is a Docker-based, config-driven tool that scaffolds ML project infrastructure on AWS via CloudFormation. It generates tier-based ML project environments — CodeCommit repositories, CodeBuild projects, CodePipeline pipelines, SageMaker resources, and IAM roles — from a simple YAML configuration file.

What it is: Infrastructure scaffolding for ML projects.

What it is not: A complete ML solution. It does not provide data pipelines, trained models, notebooks, or solution-specific ML code. Those are delivered by Phase 3 ML Solution modules (future).

Position in the MLOps Suite¶

ML Provisioner is the fifth module in the Axon Tech Labs MLOps Infrastructure Suite. It sits between the security/network layer and the SageMaker environment layer:

Phase 1 — Infrastructure (Complete)
  ├── VPC Provisioner      → Network foundation (subnets, gateways, routing)
  ├── SG Provisioner       → Security groups (tier-based, cross-tier references)
  ├── SEC Provisioner      → IAM groups, roles, policies
  ├── S3 Provisioner       → ML-optimized bucket structure
  └── LB Provisioner       → Load balancer provisioning (planned — ALB/NLB)

Phase 2 — ML Platform (Current)
  ├── ML Provisioner       → ML project scaffolding  ← THIS MODULE
  └── SageMaker Provisioner → Studio environment + extensions (next)

Phase 3 — ML Solutions (Future)
  ├── Customer Churn Solution
  ├── Fraud Detection Solution
  ├── Demand Forecasting Solution
  └── (additional solutions)

Dependency Chain via SSM Parameter Store¶

Each provisioner publishes its outputs to SSM Parameter Store. Downstream provisioners read those outputs automatically — no manual wiring required.

S3 Provisioner  (independent — no VPC dependency)
  └── Provisions: ML data lake bucket structure
  └── Publishes to SSM: Bucket names, folder paths

─────────────────────────────────────────────────────────

VPC Provisioner
  └── Provisions: VPC, subnets, routing
  └── Publishes to SSM: VPC ID, subnet IDs
        ↓
SG Provisioner  (reads VPC ID from SSM)
  └── Provisions: Security groups scoped to the VPC
  └── Publishes to SSM: Security group IDs
        ↓
LB Provisioner  (reads VPC ID, subnet IDs, SG IDs from SSM — planned)
  └── Provisions: Application/Network load balancers
  └── Publishes to SSM: Load balancer ARNs, DNS names
        ↓
ML Provisioner  (reads VPC ID, subnet IDs, SG IDs, LB outputs and S3 bucket names from SSM)
  └── Provisions: ML pipelines, SageMaker registry, VPC endpoints
  └── Publishes to SSM: Model registry ARN, KMS key ARN, endpoint IDs
        ↓
SageMaker Provisioner  (reads ML outputs and S3 bucket names from SSM — next module)
  └── Provisions: Studio domain, lifecycle configs

Design Decisions¶

Decision 1: Generic Templates, Parameterized Use Cases¶

The use case (e.g., fraud-detection, customer-churn) is a naming parameter, not a separate template. All tiers use generic CloudFormation templates. The use case name flows through to resource naming only.

Rationale: Avoids false promise of delivering a complete ML solution. Keeps v1.0.0 scope manageable. Use-case-specific templates are deferred to Phase 3 ML Solutions.

Implementation: The tier YAML files in schemas/products/ are pure data definitions — valid YAML with no placeholders or substitution tokens. They define what resources exist and their structure. The cfn_generator.py reads the tier definition and the client config separately, then constructs the CloudFormation template programmatically by building Python dict structures and dumping to YAML.

This is the same battle-tested approach used in the SG Provisioner. It avoids the fragility of string substitution (invalid YAML before substitution, silent errors, hard to validate) and keeps the tier definitions clean and independently readable.

The client edits the config file directly — no partial overrides or merging logic. Every field is explicit.

Decision 2: Tier as Primary Template Dimension¶

Three templates: starter.yaml, professional.yaml, enterprise.yaml. Tier determines which AWS resources are provisioned.

Rationale: Infrastructure complexity scales with tier, not use case. A fraud-detection starter has the same infrastructure as a churn starter — only the names differ.

Decision 3: CodeCommit as Primary, S3 as Fallback¶

Source control is configurable via source_control: codecommit or source_control: s3.

Rationale: CodeCommit returned to GA in November 2025 with AWS investment (Git LFS in Q1 2026, regional expansion Q3 2026). S3 fallback provided for organizations with CodeCommit restrictions or compliance requirements.

Decision 4: 12 Actions — Same Pattern as SG Provisioner¶

Consistent CLI interface across all provisioners. Developers who know one provisioner know all of them.

Decision 5: SSM Parameter Store Outputs¶

All deployed resource IDs stored in SSM Parameter Store for downstream consumers (SageMaker Provisioner, CI/CD pipelines, other automation).

Decision 6: Separate CI/CD Artifacts Bucket from ML Data Lake¶

The ML Provisioner (Professional/Enterprise tier) creates a dedicated CI/CD artifacts bucket separate from the S3 Provisioner data lake bucket.

S3 Provisioner bucket — ML data lake, scoped per tenant:

s3://{company_prefix}-{environment}-{tenant_id}-{region}/solutions/{use-case}/

Contains: ML data (raw/curated/processed), code, models, notebooks, configs. Read/written by data scientists and ML engineers. Long retention (years).

ML Provisioner bucket — CI/CD pipeline artifacts, scoped per project:

s3://{company_prefix}-{environment}-{tenant_id}-{use_case}-ml-artifacts/

Contains: CodeBuild outputs, CodePipeline stage artifacts, deployment packages. Read/written exclusively by CodeBuild and CodePipeline service roles. Short retention (30-90 days).

Rationale for separation:

Different access patterns — data scientists vs CI/CD service roles
Different lifecycle policies — years vs weeks/months
Different scope — one data lake shared across projects vs one artifacts bucket per project
Cleaner IAM — no overlap between human and pipeline permissions

Scope comparison:

	S3 Provisioner Bucket	ML Provisioner Bucket
Purpose	ML data lake	CI/CD pipeline artifacts
Scope	Per tenant (shared across projects)	Per ML project
Structure	130+ ML folders	Flat, pipeline-managed
Who writes	Data engineers, scientists	CodeBuild, CodePipeline
Retention	Years	30-90 days
Tier	Always created	Always created

Decision 7: No Service Catalog Dependency¶

Initial design considered AWS Service Catalog as distribution mechanism. Rejected due to:

API/console permission inconsistency
Narrow scope — designed for internal IT governance, not ML developer workflows
Additional complexity without proportional value

ML product templates distributed directly via S3 (consistent with docs hosting pattern).

Enterprise self-service pattern: Enterprise clients who want to wrap generated CloudFormation templates in Service Catalog for IT admin governance and self-service vending to data science teams can do so independently. The pattern for wrapping ML Provisioner generated templates in a Service Catalog product will be documented in INTEGRATION_EXAMPLES.md. This gives enterprise clients the governance model without adding Service Catalog complexity to the tool itself.

Revisit during SageMaker Provisioner: Service Catalog integration may become relevant again when designing the SageMaker Provisioner, particularly for SageMaker Projects which have native Service Catalog integration. This decision should be reviewed at that point.

Decision 8: IAM Resource Naming — Region Omitted¶

IAM is a global AWS service — role and policy names are unique per AWS account, not per region. Region therefore adds no differentiation value in IAM names and is omitted to stay within the 64-character AWS::IAM::Role name limit.

IAM naming pattern:

{company_prefix}-{env}-{tenant_id}-{use_case}-{suffix}

Example — standard resource vs IAM resource:

# Standard resource (region included)
globalbank-prod-c001-us-west-2-demand-forecasting-ml-build        ← CodeBuild project
globalbank-prod-c001-us-west-2-demand-forecasting-ml-dashboard    ← CloudWatch dashboard

# IAM resource (region omitted)
globalbank-prod-c001-demand-forecasting-ml-codebuild-role         ← IAM role
globalbank-prod-c001-demand-forecasting-ml-build-policy           ← IAM managed policy

All other resources retain the full standard pattern including region. This is the minimum deviation required to satisfy the AWS hard limit while preserving the naming convention’s collision-free guarantees.

use_case maximum length: The validation schema enforces a 20-character maximum on ml_product.use_case. This is derived from AWS::IAM::Role being the tightest naming constraint — with typical config values, a use case longer than 20 characters would cause IAM role names to exceed the 64-character limit.

Decision 9: EventBridge Rule → CodePipeline (Direct Invocation)¶

The model approval automation uses a direct two-stage event-driven chain:

EventBridge Event Bus
    → EventBridge Rule  (filters for SageMaker model approval events)
        → CodePipeline Deploy Pipeline  (triggered directly as Rule target)

Why this architecture:

Direct invocation — EventBridge Rules natively support CodePipeline as a target. No intermediate resource is needed.
Fewer resources — eliminates AWS::Pipes::Pipe and the pipe-execution-role IAM role, reducing the resource count by 2 per stack.
Simpler security — the existing codepipeline-role is reused as the Rule target role. No additional IAM role required.
Lower latency — direct invocation removes an intermediate step.
No maintenance overhead — no Pipe resource to monitor, update, or troubleshoot.

CloudFormation implementation:

The Rule’s Targets property references the deploy pipeline ARN constructed via Fn::Sub:

Targets:
  - Id: DeployPipelineTarget
    Arn: !Sub "arn:aws:codepipeline:${AWS::Region}:${AWS::AccountId}:{deploy-pipeline-name}"
    RoleArn: !GetAtt CodepipelineRole.Arn

Note: EventBridge Pipes (AWS::Pipes::Pipe) was evaluated and rejected. Despite architectural appeal, Pipes does not support CodePipeline as a target in its PipeTargetParameters schema. The direct Rule → CodePipeline pattern is both simpler and fully supported.

Event-driven automation flow:

Data Scientist
  └── Approves model in SageMaker Model Registry
        ↓
EventBridge Event Bus
  └── Receives: ModelPackageGroupStateChange event
        ↓
EventBridge Rule
  └── Filters: status = Approved
  └── Target: Deploy Pipeline (direct invocation)
        ↓
CodePipeline — Deploy Pipeline
  └── Pulls approved model artifact from Model Registry
  └── Runs deployment stages
  └── Deploys model to SageMaker endpoint

Decision 11: IAM Policy — CodeCommit Resource Scoping¶

The CodeCommitManagement statement in the generated IAM policy restricts all CodeCommit actions to repositories whose names begin with {ml_name}- for a specific AWS account:

"Resource": "arn:aws:codecommit:{region}:{account}:{ml_name}-*"

Since ml_name encodes the full project identity ({company_prefix}-{env}-{tenant_id}-{region}-{use_case}[-{workload}]-ml), a user holding this policy can only manage repositories belonging to that one project. Two different projects produce two non-overlapping ml_name values and therefore two non-overlapping resource scopes — Principle of Least Privilege in action.

Note: CodeCommit ARNs do not use a path separator before the repository name (unlike some other services). The pattern arn:aws:codecommit:{region}:{account}:{ml_name}-* is correct — no leading slash before {ml_name}.

Product Tier System¶

Starter Tier¶

Foundation MLOps platform. Suitable for small teams and proof-of-concept projects.

13 — CodeCommit:
- AWS::SageMaker::ModelPackageGroup — Model Registry, approval gate, version management and traceability
- AWS::CodeCommit::Repository (x2) — model-build and model-deploy repos
- AWS::CodeBuild::Project (x2) — build and deploy projects
- AWS::S3::Bucket — CodePipeline artifact store with S3 Versioning enabled
- AWS::CodePipeline::Pipeline (x2) — build and deploy pipelines
- AWS::IAM::Role (x3) — CodeBuild, CodePipeline, SageMaker execution roles
- AWS::SSM::Parameter (x2) — ModelPackageGroupArn, RepositoryUrl
10 — S3 (3 resources removed vs CodeCommit):
- AWS::CodeCommit::Repository (x2) — not created
- AWS::SSM::Parameter reduced to (x1) — RepositoryUrl not published

Use case: Small ML team, single use case, standard security.

Professional Tier¶

Starter plus enhanced monitoring, event-driven automation, and additional policies.

19 — CodeCommit (all Starter resources for CodeCommit scenario plus):
- AWS::Events::Rule — EventBridge rule triggering Deploy pipeline on model approval
- AWS::CloudWatch::Dashboard — ML pipeline monitoring dashboard
- AWS::IAM::ManagedPolicy (x2) — custom policies for enhanced access control
- AWS::SSM::Parameter (x2) — BucketName, DashboardName (total x4 with Starter)
16 — S3 (3 resources removed vs CodeCommit):
- AWS::CodeCommit::Repository (x2) — not created
- AWS::SSM::Parameter reduced — RepositoryUrl not published (total x3 with Starter)

Use case: Growing ML team, multiple use cases, enhanced security and monitoring.

Enterprise Tier¶

Professional plus VPC integration, KMS encryption, compliance monitoring, and permission boundaries.

Scenario Counts:

Source Control	VPC Mode	CFN Resources	SSM Parameters
CodeCommit	standalone	41	11
CodeCommit	sgprov	39	10
S3	standalone	38	10
S3	sgprov	36	9

41 — CodeCommit + standalone:
- AWS::SageMaker::ModelPackageGroup — Model Registry
- AWS::CodeCommit::Repository (x2) — model-build and model-deploy repos
- AWS::CodeBuild::Project (x2) — build and deploy projects
- AWS::S3::Bucket — CodePipeline artifact store
- AWS::CodePipeline::Pipeline (x2) — build and deploy pipelines
- AWS::IAM::Role (x3) — CodeBuild, CodePipeline, SageMaker execution roles
- AWS::IAM::ManagedPolicy (x3) — build policy, deploy policy, permission boundary
- AWS::Events::Rule — EventBridge rule triggering Deploy pipeline on model approval
- AWS::CloudWatch::Dashboard — ML pipeline monitoring dashboard
- AWS::CloudWatch::Alarm (x2) — root account usage, unauthorized API calls
- AWS::Logs::LogGroup — security compliance log group
- AWS::Logs::MetricFilter (x2) — security alarm filters
- AWS::SNS::Topic — security alerts topic
- AWS::SNS::Subscription — alert email subscription
- AWS::KMS::Key — encryption key for ML artifacts
- AWS::KMS::Alias — key alias
- AWS::EC2::VPCEndpoint (x4) — SageMaker API, SageMaker Runtime, S3 (Gateway), STS
- AWS::EC2::SecurityGroup — dedicated SG for VPC endpoint traffic
- AWS::SSM::Parameter (x11) — ModelPackageGroupArn, RepositoryUrl, BucketName, DashboardName, KmsKeyArn, LogGroupName, VpcEndpointIdSagemakerApi, VpcEndpointIdSagemakerRuntime, VpcEndpointIdS3, VpcEndpointIdSts, SecurityGroupId
39 — CodeCommit + sgprov (2 resources removed vs CodeCommit + standalone):
- AWS::EC2::SecurityGroup — not created (managed by SG Provisioner)
- AWS::SSM::Parameter reduced to (x10) — SecurityGroupId not published
38 — S3 + standalone (3 resources removed vs CodeCommit + standalone):
- AWS::CodeCommit::Repository (x2) — not created
- AWS::SSM::Parameter reduced to (x10) — RepositoryUrl not published
36 — S3 + sgprov (5 resources removed vs CodeCommit + standalone):
- AWS::CodeCommit::Repository (x2) — not created
- AWS::EC2::SecurityGroup — not created (managed by SG Provisioner)
- AWS::SSM::Parameter reduced to (x9) — RepositoryUrl and SecurityGroupId not published

Use case: Enterprise ML organization, strict security and compliance requirements, VPC-integrated workloads.

VPC Integration Modes (Enterprise Tier)¶

Enterprise tier supports two VPC integration modes configured via vpc_integration.mode in the YAML config:

Standalone mode — client has ML Provisioner only:

ml_product:
  tier: enterprise
  vpc_integration:
    mode: standalone
    vpc_source: parameter-store
    vpc_parameter_store_path: /vpc/globalbank-prod-c001-us-west-2-vpc/VPCId
    subnet_parameter_store_path: /vpc/globalbank-prod-c001-us-west-2-vpc/PrivateSubnetIds

Creates a dedicated AWS::EC2::SecurityGroup for VPC endpoint traffic plus all 4 VPC endpoints.

SG Provisioner mode — client has both ML Provisioner and SG Provisioner (or a bundle):

ml_product:
  tier: enterprise
  vpc_integration:
    mode: sg-provisioner
    vpc_source: parameter-store
    vpc_parameter_store_path: /vpc/globalbank-prod-c001-us-west-2-vpc/VPCId
    subnet_parameter_store_path: /vpc/globalbank-prod-c001-us-west-2-vpc/PrivateSubnetIds
    sg_parameter_store_path: /sg/globalbank-prod-c001-us-west-2-sg/app/SecurityGroupId

Reads the existing SG ID from SSM Parameter Store. Creates only the 4 VPC endpoints — no new security group created, no conflict with SG Provisioner.

Note: A future bundle combining ML Provisioner and SG Provisioner will be offered. The bundle discount reflects the tighter integration between the two provisioners in enterprise deployments.

Note on S3 Gateway endpoint route table associations: The route_table_parameter_store_path (parameter-store mode) and route_table_ids (direct mode) fields are optional. When left empty, the S3 Gateway VPC endpoint is created without explicit route table associations — the networking team is responsible for associating the endpoint with the appropriate route tables. When populated, the generator includes RouteTableIds in the endpoint resource and associations are configured automatically at deploy time.

Configuration System¶

YAML Configuration File¶

The configuration file is a complete, self-contained YAML file. A client edits it directly to enforce their own settings — no partial overrides or merging logic. Every field is explicit and visible.

client:
  company_name: Global Bank
  company_prefix: globalbank
  account_id: "123456789012"
  tenant_id: "c001"

environment:
  env: prod
  region: us-west-2

ml_product:
  use_case: fraud-detection        # naming parameter only — not a solution
  tier: professional               # starter | professional | enterprise
  source_control: codecommit       # codecommit | s3
  product_name_override: ""        # optional override for auto-generated name
  workload: ""                     # optional discriminator for multiple products

tags:
  cost_center: Fraud Operations
  project: Real-time Credit Card Fraud Detection System
  owner: fraud-ml-engineering-team

Product Naming Convention¶

Format	Pattern	Example
Without workload	`{prefix}-{env}-{tenant}-{region}-{use_case}-ml`	`globalbank-prod-c001-us-west-2-fraud-detection-ml`
With workload	`{prefix}-{env}-{tenant}-{region}-{use_case}-{workload}-ml`	`globalbank-prod-c001-us-west-2-fraud-detection-v2-ml`

Multi-* System¶

The ML Provisioner is designed from the ground up as a multi-* system. Every dimension of variation is encoded in the configuration file and flows through to resource naming automatically. No special multi-* logic is needed in the tool.

The five dimensions:

Multi-Company (subsidiaries with own company prefix)
  globalbank-prod-c001-us-west-2-fraud-detection-ml
  globalbank-europe-prod-c001-eu-west-1-fraud-detection-ml
  globalbank-asia-prod-c001-ap-southeast-1-fraud-detection-ml

Multi-Tenant (multiple tenants within same AWS account)
  globalbank-prod-c001-us-west-2-fraud-detection-ml  ← tenant c001
  globalbank-prod-c002-us-west-2-fraud-detection-ml  ← tenant c002
  globalbank-prod-c003-us-west-2-fraud-detection-ml  ← tenant c003

Multi-Environment (dev, staging, prod)
  globalbank-dev-c001-us-west-2-fraud-detection-ml
  globalbank-staging-c001-us-west-2-fraud-detection-ml
  globalbank-prod-c001-us-west-2-fraud-detection-ml

Multi-Region
  globalbank-prod-c001-us-west-2-fraud-detection-ml
  globalbank-prod-c001-us-east-1-fraud-detection-ml
  globalbank-prod-c001-eu-west-1-fraud-detection-ml

Multi-Use-Case (within same tenant/env/region)
  globalbank-prod-c001-us-west-2-fraud-detection-ml
  globalbank-prod-c001-us-west-2-customer-churn-ml
  globalbank-prod-c001-us-west-2-demand-forecasting-ml

All five dimensions are handled by the same tool, same config pattern, same 12 commands. Each combination produces a completely isolated CloudFormation stack with its own resources and SSM Parameter Store paths.

The workload discriminator is the key differentiator that allows a client to deploy multiple distinct ML solutions within the exact same company/tenant/env/region combination without any naming collision. Without it, only one stack per use-case per environment is possible. With it, a client can create as many isolated variations as needed:

# Without workload — only one allowed per combination
globalbank-prod-c001-us-west-2-fraud-detection-ml

# With workload — unlimited isolated variations
globalbank-prod-c001-us-west-2-fraud-detection-realtime-ml
globalbank-prod-c001-us-west-2-fraud-detection-batch-ml
globalbank-prod-c001-us-west-2-fraud-detection-cards-ml
globalbank-prod-c001-us-west-2-fraud-detection-loans-ml

Each workload gets its own completely isolated CloudFormation stack, CodeCommit repos, pipelines, artifacts bucket, and SSM Parameter Store paths. Same company, same tenant, same environment, same region, same use case — but four independent ML scaffolding environments for different fraud detection workloads.

Config fields driving each dimension:

Dimension	Config Field
Company	`client.company_prefix`
Tenant	`client.tenant_id`
Environment	`environment.env`
Region	`environment.region`
Use Case	`ml_product.use_case`
Workload	`ml_product.workload`

This is one of the strongest differentiators of the Axon Tech Labs MLOps Suite — a single tool handles the full complexity of a large enterprise with subsidiaries, multiple teams, multiple environments, and multiple regions, all from simple YAML configuration files.

Multi-environment deployments are handled by creating separate configuration files per environment. No special multi-environment logic is needed in the tool — isolation is automatic through resource naming.

Configuration files per environment:

configs/
├── globalbank-dev-c001-us-west-2-fraud-detection-ml.yaml
├── globalbank-staging-c001-us-west-2-fraud-detection-ml.yaml
└── globalbank-prod-c001-us-west-2-fraud-detection-ml.yaml

Each config sets environment.env to dev, staging, or prod respectively. The generator produces three completely isolated CloudFormation stacks:

globalbank-dev-c001-us-west-2-fraud-detection-ml-stack
globalbank-staging-c001-us-west-2-fraud-detection-ml-stack
globalbank-prod-c001-us-west-2-fraud-detection-ml-stack

Each stack has its own isolated resources — CodeCommit repos, CodeBuild projects, CodePipeline pipelines, S3 artifacts bucket — and its own SSM Parameter Store paths under /ml/globalbank-{env}-c001-.../.

Tiers can differ per environment — a common and recommended pattern:

Environment	Tier	Rationale
dev	starter	Cheap, fast iteration, no compliance overhead
staging	professional	Mirrors prod, event-driven approval gate, monitoring
prod	enterprise	VPC-only, KMS encryption, compliance logging, permission boundaries

This pattern creates a natural upgrade path and keeps costs low in non-production environments while maintaining full enterprise controls in production.

Note on licensing: The ML Provisioner license is enforced per AWS account via AWS License Manager. Each AWS account requires its own Marketplace subscription. Clients running all environments in a single AWS account need only one subscription. Clients following the recommended account-per-environment isolation pattern will need one subscription per account — this is consistent with standard AWS Marketplace licensing across all IaC tools.

Each environment is deployed independently:

# Deploy dev (starter tier)
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/templates:/app/templates \
  -v $(pwd)/ml/reports:/app/reports \
  -e AWS_PROFILE=dev-profile \
  ml-provisioner:starter \
  -con globalbank-dev-c001-us-west-2-fraud-detection-ml.yaml \
  -act deploy-product --force

# Deploy staging (professional tier)
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/templates:/app/templates \
  -v $(pwd)/ml/reports:/app/reports \
  -e AWS_PROFILE=staging-profile \
  ml-provisioner:professional \
  -con globalbank-staging-c001-us-west-2-fraud-detection-ml.yaml \
  -act deploy-product --force

# Deploy prod (enterprise tier)
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/templates:/app/templates \
  -v $(pwd)/ml/reports:/app/reports \
  -e AWS_PROFILE=prod-profile \
  ml-provisioner:enterprise \
  -con globalbank-prod-c001-us-west-2-fraud-detection-ml.yaml \
  -act deploy-product --force

See VPC Integration Modes in the Enterprise Tier section above. The vpc_integration block in the config replaces the simple vpc_source field and supports both standalone and SG Provisioner integration modes.

CloudFormation Generation¶

The cfn_generator.py module generates CloudFormation templates from two inputs — the selected tier blueprint and the client YAML configuration. The blueprint defines structure. The client config provides identity. They meet only inside the generator — no string substitution, no placeholders.

Generation Flow¶

YAML Config
    ↓
ConfigLoader (parses YAML, resolves paths)
    ↓
ConfigValidator (validates against tier JSON schema)
    ↓
ProductLoader (loads tier blueprint)
    ↓
ProductValidator (security + schema checks)
    ↓
CfnGenerator (constructs CFN as Python dicts, dumps to YAML)
    ↓
CloudFormation Template (saved to templates/)

Security Validation¶

The ProductValidator runs before template generation and blocks or warns on dangerous patterns. Checks are scoped strictly to resources provisioned by ML Provisioner.

Blocking Checks (generation halted)¶

IAM:

IAM roles with * resource and no condition — enforces Principle of Least Privilege
Inline IAM policies — blocked in favor of managed policies for better versioning and reusability
Hardcoded credentials in generated template — scans for plaintext AWS access key patterns (AKIA...) before saving

Storage:

Public S3 buckets (enterprise tier) — blocks PublicAccessBlockConfiguration disabled
Missing KMS encryption (enterprise tier) — Customer Managed Keys required for auditability and control

Compute:

CodeBuild projects with privileged mode enabled without justification — prevents root-level access to host Docker daemon, a common privilege escalation vector

Networking:

SSH (port 22) or RDP (port 3389) open to 0.0.0.0/0 on endpoint SecurityGroup (enterprise standalone mode)

Logging:

CloudWatch LogGroup retention below 90 days (enterprise tier) — enforces minimum retention for audit and incident response

Warning Checks (generation proceeds with warning)¶

IAM:

Roles containing high-risk actions: iam:PassRole, iam:CreateAccessKey, s3:DeleteBucket — warned rather than blocked because iam:PassRole is legitimately required for SageMaker execution roles. Warning includes justification guidance

Tags:

Missing required tags on any taggable resource — essential for ABAC (Attribute-Based Access Control) and governance

Out of Scope¶

VPC Flow Logs — VPC Provisioner responsibility. ML Provisioner does not create VPCs
Security Groups for application tiers — SG Provisioner responsibility
RDS PubliclyAccessible — ML Provisioner does not provision RDS
Load Balancer / CloudFront HTTPS enforcement — ML Provisioner does not provision these resources
EC2 public IP assignment — ML Provisioner does not provision EC2 instances
S3 account-level public access blocks — ML Provisioner does not modify account-level settings

For the full technical reference including blueprint schema, generation algorithm, client data injection, naming conventions, conditional generation logic, and concrete examples see CFN_GENERATOR.md.

SSM Parameter Store Integration¶

All deployed resource identifiers are stored in SSM Parameter Store at deployment time under the path /ml/{product-name}/, where {product-name} is derived from the configuration as:

{company_prefix}-{env}-{tenant_id}-{region}-{use_case}-ml

Example (globalbank enterprise deployment):

/ml/globalbank-prod-c001-us-west-2-demand-forecasting-ml/ModelPackageGroupArn
/ml/globalbank-prod-c001-us-west-2-demand-forecasting-ml/RepositoryUrl
/ml/globalbank-prod-c001-us-west-2-demand-forecasting-ml/BucketName
...

Full parameter list by tier:

/ml/{product-name}/ModelPackageGroupArn           (all tiers)
/ml/{product-name}/RepositoryUrl                  (codecommit only)
/ml/{product-name}/BucketName                     (professional + enterprise)
/ml/{product-name}/DashboardName                  (professional + enterprise)
/ml/{product-name}/KmsKeyArn                      (enterprise only)
/ml/{product-name}/LogGroupName                   (enterprise only)
/ml/{product-name}/SecurityGroupId                (enterprise standalone mode only)
/ml/{product-name}/VpcEndpointIdSagemakerApi       (enterprise only)
/ml/{product-name}/VpcEndpointIdSagemakerRuntime   (enterprise only)
/ml/{product-name}/VpcEndpointIdS3                (enterprise only)
/ml/{product-name}/VpcEndpointIdSts               (enterprise only)

These paths are available for consumption by downstream tooling — such as a SageMaker Provisioner — to configure Studio domains and projects without manual cross-referencing.

Actions Reference¶

All actions require AWS credentials for subscription validation. Actions marked Mutating additionally require --force.

Action	AWS Calls	–force	Purpose
`validate-config`	subscription only	❌	Validate YAML schema and field values
`list-products`	subscription only	❌	List available tier templates
`show-product`	subscription only	❌	Display tier resources and configuration
`create-policy`	subscription only	❌	Generate least-privilege IAM policy
`create-prov-template`	subscription only	❌	Generate CloudFormation template
`validate-prov-template`	subscription only	❌	Validate template locally
`create-review-report`	subscription only	❌	Generate pre-deployment HTML report
`show-changes`	read-only	❌	Preview changes against deployed stack
`check-drift`	read-only	❌	Detect infrastructure drift
`test-deploy`	read-only	❌	Deploy with isolated suffix for testing
`deploy-product`	mutating	✅	Deploy ML product infrastructure
`delete-product`	mutating	✅	Delete stack and all resources

Source Tree¶

packages/ml-provisioner-tool/
├── configs
│   ├── edge-prod-b001-us-west-2-fraud-detection-ml-codecommit-workload.yaml
│   ├── edge-prod-b001-us-west-2-fraud-detection-ml-codecommit.yaml
│   ├── edge-prod-b001-us-west-2-fraud-detection-ml-s3-workload.yaml
│   ├── edge-prod-b001-us-west-2-fraud-detection-ml-s3.yaml
│   ├── globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-sgprov-direct.yaml
│   ├── globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-sgprov-ssm.yaml
│   ├── globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-standalone-direct-rtb.yaml
│   ├── globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-standalone-direct.yaml
│   ├── globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-standalone-ssm-workload.yaml
│   ├── globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-standalone-ssm.yaml
│   ├── globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-sgprov-direct.yaml
│   ├── globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-sgprov-ssm.yaml
│   ├── globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-standalone-direct-rtb.yaml
│   ├── globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-standalone-direct.yaml
│   ├── globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-standalone-ssm-workload.yaml
│   ├── globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-standalone-ssm.yaml
│   ├── techcorp-prod-a001-us-west-2-customer-churn-ml-codecommit-workload.yaml
│   ├── techcorp-prod-a001-us-west-2-customer-churn-ml-codecommit.yaml
│   ├── techcorp-prod-a001-us-west-2-customer-churn-ml-s3-workload.yaml
│   ├── techcorp-prod-a001-us-west-2-customer-churn-ml-s3.yaml
│   └── examples
│       ├── enterprise
│       │   ├── globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-sgprov-direct.yaml
│       │   ├── globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-sgprov-ssm.yaml
│       │   ├── globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-standalone-direct-rtb.yaml
│       │   ├── globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-standalone-direct.yaml
│       │   ├── globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-standalone-ssm-workload.yaml
│       │   ├── globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-standalone-ssm.yaml
│       │   ├── globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-sgprov-direct.yaml
│       │   ├── globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-sgprov-ssm.yaml
│       │   ├── globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-standalone-direct-rtb.yaml
│       │   ├── globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-standalone-direct.yaml
│       │   ├── globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-standalone-ssm-workload.yaml
│       │   └── globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-standalone-ssm.yaml
│       ├── professional
│       │   ├── edge-prod-b001-us-west-2-fraud-detection-ml-codecommit-workload.yaml
│       │   ├── edge-prod-b001-us-west-2-fraud-detection-ml-codecommit.yaml
│       │   ├── edge-prod-b001-us-west-2-fraud-detection-ml-s3-workload.yaml
│       │   └── edge-prod-b001-us-west-2-fraud-detection-ml-s3.yaml
│       └── starter
│           ├── techcorp-prod-a001-us-west-2-customer-churn-ml-codecommit-workload.yaml
│           ├── techcorp-prod-a001-us-west-2-customer-churn-ml-codecommit.yaml
│           ├── techcorp-prod-a001-us-west-2-customer-churn-ml-s3-workload.yaml
│           └── techcorp-prod-a001-us-west-2-customer-churn-ml-s3.yaml
├── docker
│   ├── Dockerfile
│   └── entrypoint.sh
├── docs
│   ├── sphinx
│   │   └── source
│   │       ├── conf.py
│   │       ├── index.rst
│   │       └── onboarding
│   ├── APPLICATION_ARCHITECTURE.md
│   ├── CFN_GENERATOR.md                      # internal
│   ├── CFN_GENERATOR_IMPLEMENTATION_STEPS.md # internal
│   ├── CONFIGURATION.md
│   ├── CONFIGURATION_GUIDE.md
│   ├── FEEDBACK.md
│   ├── IAM_PERMISSIONS.md
│   ├── INTEGRATION_EXAMPLES.md
│   ├── MIGRATION_GUIDE.md
│   ├── NAMING_CONVENTIONS.md
│   ├── PREREQUISITES.md
│   ├── README.md
│   ├── RELEASE_NOTES.md
│   ├── RESOURCES_EXPLAINED.md
│   ├── ROADMAP.md
│   ├── SAMPLE_REPORTS.md
│   ├── SECURITY_GUIDELINES.md
│   ├── SUPPORT.md
│   ├── TROUBLESHOOTING.md
│   ├── UPDATE_PROCEDURES.md
│   └── USER_GUIDE.md
├── policies
├── reports
├── schemas
│   ├── products
│   │   ├── enterprise.yaml
│   │   ├── professional.yaml
│   │   └── starter.yaml
│   ├── validation-schema-enterprise.yaml
│   ├── validation-schema-professional.yaml
│   ├── validation-schema-starter.yaml
│   └── validation-schema.yaml
├── src
│   └── ml_provisioner
│       ├── __init__.py
│       ├── __main__.py
│       ├── cli.py
│       ├── config
│       │   ├── __init__.py
│       │   ├── app_config.yaml
│       │   └── loader.py
│       ├── core
│       │   ├── __init__.py
│       │   └── ml_manager.py
│       ├── generators
│       │   ├── __init__.py
│       │   └── cfn_generator.py
│       ├── license
│       │   ├── __init__.py
│       │   └── validator.py
│       ├── models
│       │   ├── __init__.py
│       │   └── product.py
│       ├── products
│       │   ├── __init__.py
│       │   ├── loader.py
│       │   └── validator.py
│       └── utils
│           ├── __init__.py
│           ├── html_generator.py
│           └── review_report.py
├── templates
├── tests
├── LICENSE.txt
├── README.MD
├── Makefile
├── pyproject.toml
├── setup.py
└── uv.lock

Future Roadmap¶

See Roadmap for the full roadmap including planned features and deferred enhancements.

Application Architecture¶

Table of Contents¶

Overview¶

Position in the MLOps Suite¶

Dependency Chain via SSM Parameter Store¶

Design Decisions¶

Decision 1: Generic Templates, Parameterized Use Cases¶

Decision 2: Tier as Primary Template Dimension¶

Decision 3: CodeCommit as Primary, S3 as Fallback¶

Decision 4: 12 Actions — Same Pattern as SG Provisioner¶

Decision 5: SSM Parameter Store Outputs¶

Decision 6: Separate CI/CD Artifacts Bucket from ML Data Lake¶

Decision 7: No Service Catalog Dependency¶

Decision 8: IAM Resource Naming — Region Omitted¶

Decision 9: EventBridge Rule → CodePipeline (Direct Invocation)¶

Decision 10: License Per AWS Account, No Template Sharing Mechanism¶

Decision 11: IAM Policy — CodeCommit Resource Scoping¶

Product Tier System¶

Starter Tier¶

Professional Tier¶

Enterprise Tier¶

VPC Integration Modes (Enterprise Tier)¶

Configuration System¶

YAML Configuration File¶

Product Naming Convention¶

Multi-* System¶

CloudFormation Generation¶

Generation Flow¶

Security Validation¶

Blocking Checks (generation halted)¶

Warning Checks (generation proceeds with warning)¶

Out of Scope¶

SSM Parameter Store Integration¶

Actions Reference¶

Source Tree¶

Future Roadmap¶