READMEΒΆ

Enterprise-grade ML product infrastructure provisioning for AWS β€” tier-based CloudFormation deployment with SageMaker, CodePipeline, CodeBuild, KMS, and VPC integration.

Table of ContentsΒΆ


The PurposeΒΆ

ML Provisioner is an infrastructure-as-code tool that streamlines deployment and management of machine learning environments on AWS. Using AWS CloudFormation, it enables teams to deploy standardized, tier-based ML infrastructure β€” from isolated starter environments to enterprise-grade setups integrated into existing VPC networks. The tool enforces security best practices out-of-the-box through automatic least-privilege IAM policy generation, pre-deployment configuration validation, and change-set previews. ML Provisioner also ensures long-term stability through continuous drift detection, supports isolated test deployments, and automatically publishes resource identifiers to AWS SSM Parameter Store for seamless integration with downstream applications and CI/CD pipelines.

Key FeaturesΒΆ

  • Tiered Architecture: Supports Starter, Professional, and Enterprise configurations.

  • VPC Integration: Seamlessly attaches to existing enterprise VPCs (enterprise tier).

  • Automated IAM: Generates context-aware, least-privilege policies out of the box.

  • Safety First: Pre-deployment template validation and change-set previews.

  • Drift Detection: Monitors and flags infrastructure drift against active, deployed stacks.

  • Dry Runs: Safe test deployments with isolated resource names.

  • Downstream Readiness: Automatically publishes resource identifiers to AWS SSM Parameter Store.


Prerequisites & InstallationΒΆ

RequirementsΒΆ

Requirement

Version

Notes

Docker

20.10+

Required to run the ML Provisioner CLI

AWS CLI

2.x

Required for credential configuration and AWS resource verification

AWS Account

β€”

With permissions to create CloudFormation, SageMaker, CodePipeline, CodeBuild, IAM, S3 and SSM resources

InstallationΒΆ

No installation is required. ML Provisioner is distributed as a Docker image via AWS Marketplace. Pull the image for your purchased tier:

# Starter tier
docker pull ml-provisioner:starter

# Professional tier
docker pull ml-provisioner:professional

# Enterprise tier
docker pull ml-provisioner:enterprise

AWS Credentials SetupΒΆ

# Configure AWS CLI
aws configure

# Verify credentials
aws sts get-caller-identity

Working Directory SetupΒΆ

Create the required directory structure

mkdir -p ml/{configs,policies,templates,reports,docs}

Copy documentation from Docker image to docs/. Replace ":starter" with ":professional" or ":enterprise" to match your tier.

docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/docs:/output \
  --entrypoint cp \
  ml-provisioner:starter \
  -r /app/docs/. /output/

Copy example configuration file from Docker image to configs/.

  • For starter tier

docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs \
  --entrypoint cp \
  ml-provisioner:starter \
  -r /app/examples/configs/. /app/configs/
  • For professional tier

docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs \
  --entrypoint cp \
  ml-provisioner:professional \
  -r /app/examples/configs/. /app/configs/
  • For enterprise tier

docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs \
  --entrypoint cp \
  ml-provisioner:enterprise \
  -r /app/examples/configs/. /app/configs/

Architecture & Tier MatrixΒΆ

Feature

Starter Tier

Professional Tier

Enterprise Tier

Target Use Case

Sandbox & PoC

Team Development

Production Workloads

Docker Image

ml-provisioner:starter

ml-provisioner:professional

ml-provisioner:enterprise

Network

Default AWS VPC

Default AWS VPC

Existing Custom VPC

VPC Endpoints

❌

❌

βœ… (SageMaker, S3, STS)

Source Control

CodeCommit / S3

CodeCommit / S3

CodeCommit / S3

S3 Artifacts Bucket

βœ…

βœ…

βœ…

EventBridge Rule

❌

βœ…

βœ…

CloudWatch Dashboard

❌

βœ…

βœ…

KMS Encryption

❌

❌

βœ…

Compliance Log Group

❌

❌

βœ…

CloudWatch Alarms

❌

❌

βœ…

SNS Security Alerts

❌

❌

βœ…

IAM Security

Standard

Strict Automated

Strict + Permission Boundary

IAM Managed Policies

❌

βœ… (build, deploy)

βœ… (build, deploy, boundary)

SSM Parameter Store Outputs

βœ…

βœ…

βœ…

CFN Resources (codecommit)

13

19

39–41

CFN Resources (s3)

10

16

36–38

Quick StartΒΆ

1. Configuration File SelectionΒΆ

For each category, pick the option that matches your setup. Combine the tokens to identify your config file.

Category

Option

Token

Tier

starter

techcorp

Tier

professional

edge

Tier

enterprise

globalbank

Source Control

CodeCommit

codecommit

Source Control

S3

s3

VPC Resolution (enterprise only)

SSM Parameter Store

ssm

VPC Resolution (enterprise only)

Direct VPC ID

direct

SG Resolution (enterprise only)

Create on the fly

standalone

SG Resolution (enterprise only)

Existing via SG-provisioner

sgprov

Workload Discriminator

Yes

workload

Workload Discriminator

No

(no workload in file name)

Example: Enterprise + CodeCommit + Direct VPC + Standalone SG + no workload

Category

Selected option

Token

Tier

enterprise

globalbank

Source Control

CodeCommit

codecommit

VPC Resolution

Direct VPC ID

direct

SG Resolution

Create on the fly

standalone

Workload Discriminator

No

(none)

β†’ globalbank-...-demand-forecasting-ml-codecommit-standalone-direct.yaml

2. Configuration File AdjustmentΒΆ

All tiers β€” replace these fields:

Field

Location

Example

company_name

client.company_name

TechCorp

company_prefix

client.company_prefix

techcorp

account_id

client.account_id

"123456789012"

tenant_id

client.tenant_id

"a001"

region

environment.region

us-west-2

use_case

ml_product.use_case

customer-churn

cost_center, project, owner

tags

your org values

Professional and Enterprise β€” additionally:

Field

Notes

alerts_email

Email for CloudWatch alarm notifications

log_retention_days

Optional, minimum 90

Enterprise only β€” additionally:

Field

Notes

vpc_integration.vpc_id

Your VPC ID

vpc_integration.subnet_ids

At least 2 subnet IDs

vpc_integration.route_table_ids

Required only for sgprov VPC source mode

3. Common WorkflowΒΆ

Set your variables first:

CONFIG=techcorp-prod-a001-us-west-2-customer-churn-ml-codecommit.yaml
IMAGE=starter

Run the commands in sequence:

# List available tiers
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act list-products

# Show resources for selected tier
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act show-product

# Validate configuration against tier schema
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act validate-config

# Generate least-privilege IAM policy
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/policies:/app/policies \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act create-policy

# Generate CloudFormation provisioning template
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/templates:/app/templates \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act create-prov-template

# Validate generated template locally
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/templates:/app/templates \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act validate-prov-template

# Generate pre-deployment HTML review report
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/templates:/app/templates \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act create-review-report

# Test deployment with isolated random suffix (recommended before production deploy)
# The test stack name is printed upon completion β€” note it down
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act test-deploy

# Delete the test stack once verified (replace <test-stack-name> with the name printed above)
aws cloudformation delete-stack --stack-name <test-stack-name>
aws cloudformation wait stack-delete-complete --stack-name <test-stack-name>

# Deploy ML product infrastructure via CloudFormation
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/templates:/app/templates \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act deploy-product --force

# Preview changes against deployed stack
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/templates:/app/templates \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act show-changes

# Check infrastructure drift
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act check-drift

# Delete ML product infrastructure
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act delete-product --force

AWS CredentialsΒΆ

Option 1: AWS Profile (Recommended)

-v ~/.aws:/home/mluser/.aws:ro

Option 2: Environment Variables

-e AWS_ACCESS_KEY_ID=<access_key> \
-e AWS_SECRET_ACCESS_KEY=<secret_key> \
-e AWS_DEFAULT_REGION=<aws-region>

Option 3: IAM Role (when running on EC2/ECS)

# No credentials needed β€” uses instance role

Applies to all tiers. Most commonly used in enterprise CI/CD pipelines where the ML Provisioner runs from an EC2 instance or ECS task with an attached IAM role. Ensure the instance role has the permissions generated by create-policy.


What Gets CreatedΒΆ

When you run deploy-product, the tool creates a CloudFormation stack containing:

Starter TierΒΆ

  • SageMaker Model Package Group (Model Registry)

  • CodeCommit repositories (model-build, model-deploy) β€” or S3 source

  • CodeBuild projects (build, deploy)

  • CodePipeline pipelines (build-pipeline, deploy-pipeline)

  • IAM roles (codebuild, pipeline, sagemaker execution)

  • SSM Parameter Store outputs (ModelPackageGroupArn, RepositoryUrl)

Professional Tier (all Starter plus)ΒΆ

  • S3 artifacts bucket

  • EventBridge rule for automated pipeline triggers

  • CloudWatch dashboard

  • IAM managed policies (build, deploy)

  • SSM Parameter Store outputs (BucketName, DashboardName)

Enterprise Tier (all Professional plus)ΒΆ

  • KMS key + alias for encryption

  • VPC endpoints (SageMaker API, SageMaker Runtime, S3, STS)

  • EC2 Security Group (standalone mode only)

  • CloudWatch compliance log group + metric filters + alarms

  • SNS topic + subscription for security alerts

  • IAM permission boundary policy

  • SSM Parameter Store outputs (KmsKeyArn, LogGroupName, VpcEndpointIdSagemakerApi, VpcEndpointIdSagemakerRuntime, VpcEndpointIdS3, VpcEndpointIdSts, SecurityGroupId*)

* SecurityGroupId published in standalone mode only.

All resources are tagged and managed as a single CloudFormation stack for easy audit and cleanup.


Directory StructureΒΆ

ml/
β”œβ”€β”€ configs/       # ML product configuration files (YAML)
β”œβ”€β”€ policies/      # Generated IAM policies (JSON)
β”œβ”€β”€ templates/     # Generated CloudFormation templates (YAML)
└── reports/       # Execution logs and HTML reports

Accessing DocumentationΒΆ

Documentation is available online at https://docs.axontechlabs.com/ml/index.html.

It is also embedded in the Docker image and can be extracted locally as described in Working Directory Setup.

# Verify extraction
ls ml/docs/

Open ml/docs/index.html in your browser to view the full documentation offline.


Quick TroubleshootingΒΆ

Config validation fails

  • Check the error message for the specific field causing the failure

  • Verify tier in config matches your purchased image tag

Template file not found at deploy time

  • Run create-prov-template before deploy-product

  • Ensure -v $(pwd)/ml/templates:/app/templates is in the Docker command

Stack already exists

  • Use delete-product --force to remove the existing stack

  • Or use show-changes to preview what would change

VPC prerequisites missing (enterprise tier)

  • Follow the prerequisite checks in Configuration Guide

  • Ensure VPC, SSM params, and SG stack are in place before deploying

Permission denied

  • Run create-policy to generate the required IAM policy

  • Verify AWS credentials: aws sts get-caller-identity


LicenseΒΆ

This product requires a valid AWS Marketplace subscription.