User Guide¶

All commands are run from the mlops-infra-suite/ root directory.

Table of Contents¶

Pre-Deployment Checklist
Quick Reference — Actions by Safety Level
Prerequisites
Configuration
Business Workflow
Scenario Matrix
Multi-Environment Deployment
Volume Mounts
AWS Credentials
Best Practices
FAQ

Quick Reference — Actions by Safety Level¶

Local Actions (No AWS Calls besides subscription check)¶

Action	Description
`validate-config`	Validate configuration YAML against schema
`list-products`	List available tier templates
`show-product`	Show resources for the selected tier
`create-policy`	Generate least-privilege IAM policy
`create-prov-template`	Generate CloudFormation provisioning template
`validate-prov-template`	Validate generated template locally
`create-review-report`	Generate pre-deployment HTML review report

Read-Only AWS Actions¶

Action	Description
`show-changes`	Preview what would change in the deployed stack
`check-drift`	Detect infrastructure drift against deployed stack
`test-deploy`	Deploy with test suffix for safe isolated testing

Mutating AWS Actions (`--force` required)¶

Action	Description
`deploy-product`	Deploy ML product infrastructure via CloudFormation
`delete-product`	Delete CloudFormation stack and all associated resources

Prerequisites¶

See README for full prerequisites and installation instructions.

Configuration¶

For configuration file structure and field reference see Configuration Reference. For scenario-based guidance on selecting and populating the right config file see Configuration Guide.

Business Workflow¶

The commands below represent the complete lifecycle of an ML product deployment. Run them in order:

Steps 1–7 are local and require no AWS calls besides the subscription check
Steps 8–12 require AWS credentials and, for enterprise tier, the VPC must be deployed and available — see README for prerequisites

Scenario Matrix¶

Dimension Values¶

Dimension	Value	Meaning
`source_control`	`codecommit`	AWS CodeCommit repositories are created for model-build and model-deploy source code
	`s3`	An existing S3 bucket is used as the pipeline source — no CodeCommit repositories are created
`vpc_mode`	`standalone`	ML Provisioner creates and manages its own endpoint Security Group
	`sgprov`	Security Group is managed externally by SG Provisioner — ML Provisioner skips SG creation and reads the existing SG ID from SSM
`vpc_source`	`ssm`	VPC ID and subnet IDs are resolved at deploy time from SSM Parameter Store paths — typically populated by VPC Provisioner
	`direct`	VPC ID and subnet IDs are hardcoded directly in the configuration file
`workload`	empty	No workload discriminator — `ml_name` follows standard pattern
	`realtime`	Workload discriminator appended to `ml_name` — allows multiple ML products in the same environment
`route_table_ids`	empty `[]`	No route table IDs — networking team manages S3 Gateway endpoint route associations manually
	populated	Route table IDs provided — S3 Gateway endpoint route associations configured automatically at deploy time

Config File	Tier	Image	Source Control	VPC Mode	VPC Source	Notes
`techcorp-prod-a001-us-west-2-customer-churn-ml-codecommit.yaml`	starter	`ml-provisioner:starter`	codecommit	—	—	Representative starter scenario
`techcorp-prod-a001-us-west-2-customer-churn-ml-codecommit-workload.yaml`	starter	`ml-provisioner:starter`	codecommit	—	—	workload=realtime variant
`techcorp-prod-a001-us-west-2-customer-churn-ml-s3.yaml`	starter	`ml-provisioner:starter`	s3	—	—	Starter + S3 source
`techcorp-prod-a001-us-west-2-customer-churn-ml-s3-workload.yaml`	starter	`ml-provisioner:starter`	s3	—	—	S3 + workload=realtime variant
`edge-prod-b001-us-west-2-fraud-detection-ml-codecommit.yaml`	professional	`ml-provisioner:professional`	codecommit	—	—	Representative professional scenario
`edge-prod-b001-us-west-2-fraud-detection-ml-codecommit-workload.yaml`	professional	`ml-provisioner:professional`	codecommit	—	—	workload=realtime variant
`edge-prod-b001-us-west-2-fraud-detection-ml-s3.yaml`	professional	`ml-provisioner:professional`	s3	—	—	Professional + S3 source
`edge-prod-b001-us-west-2-fraud-detection-ml-s3-workload.yaml`	professional	`ml-provisioner:professional`	s3	—	—	S3 + workload=realtime variant
`globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-standalone-ssm.yaml`	enterprise	`ml-provisioner:enterprise`	codecommit	standalone	ssm	Representative enterprise scenario
`globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-standalone-direct.yaml`	enterprise	`ml-provisioner:enterprise`	codecommit	standalone	direct
`globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-sgprov-ssm.yaml`	enterprise	`ml-provisioner:enterprise`	codecommit	sgprov	ssm
`globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-sgprov-direct.yaml`	enterprise	`ml-provisioner:enterprise`	codecommit	sgprov	direct
`globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-standalone-ssm.yaml`	enterprise	`ml-provisioner:enterprise`	s3	standalone	ssm
`globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-standalone-direct.yaml`	enterprise	`ml-provisioner:enterprise`	s3	standalone	direct
`globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-sgprov-ssm.yaml`	enterprise	`ml-provisioner:enterprise`	s3	sgprov	ssm
`globalbank-prod-c001-us-west-2-demand-forecasting-ml-s3-sgprov-direct.yaml`	enterprise	`ml-provisioner:enterprise`	s3	sgprov	direct
`globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-standalone-ssm-workload.yaml`	enterprise	`ml-provisioner:enterprise`	codecommit	standalone	ssm	workload=realtime variant
`globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-standalone-direct-rtb.yaml`	enterprise	`ml-provisioner:enterprise`	codecommit	standalone	direct	route_table_ids populated variant

Representative Scenarios¶

Set variables CONFIG and IMAGE for your tier, then run the commands below.

Tier	IMAGE	CONFIG
Starter	`starter`	`techcorp-prod-a001-us-west-2-customer-churn-ml-codecommit.yaml`
Professional	`professional`	`edge-prod-b001-us-west-2-fraud-detection-ml-codecommit.yaml`
Enterprise	`enterprise`	`globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-standalone-ssm.yaml`

Note — Enterprise prerequisite:

VPC deployed

SSM paths populated:

/vpc/globalbank-prod-c001-us-west-2-vpc/VPCId

/vpc/globalbank-prod-c001-us-west-2-vpc/PrivateSubnetIds

For example:

CONFIG=techcorp-prod-a001-us-west-2-customer-churn-ml-codecommit.yaml
IMAGE=starter

# 1. List available products
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act list-products

# 2. Show product resources for this tier
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act show-product

# 3. Validate configuration
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act validate-config

# 4. Generate IAM policy
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/policies:/app/policies \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act create-policy

# 5. Generate CloudFormation template
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/templates:/app/templates \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act create-prov-template

# 6. Validate generated template
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/templates:/app/templates \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act validate-prov-template

# 7. Generate pre-deployment review report
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/templates:/app/templates \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act create-review-report

# 8. Preview changes (requires deployed stack)
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/templates:/app/templates \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act show-changes

# 9. Check drift (requires deployed stack)
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act check-drift

# 10. Test deploy — stack name printed upon completion, note it down
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act test-deploy

# Delete the test stack once verified (replace <test-stack-name> with the name printed above)
aws cloudformation delete-stack --stack-name <test-stack-name>
aws cloudformation wait stack-delete-complete --stack-name <test-stack-name>

# 11. Deploy product
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/templates:/app/templates \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act deploy-product --force

# 12. Delete product
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act delete-product --force

Multi-Environment Deployment¶

The recommended pattern is to deploy a different tier per environment, each in its own AWS account, following the AWS Well-Architected best practice of account-per-environment isolation.

Environment	Tier	Image
dev	starter	`ml-provisioner:starter`
staging	professional	`ml-provisioner:professional`
prod	enterprise	`ml-provisioner:enterprise`

Each environment requires a separate config file. See Application Architecture for architecture details.

Switching AWS Accounts¶

Each Docker command mounts ~/.aws from the local machine. To target a different AWS account per environment, set the AWS_PROFILE environment variable in the Docker run command:

# Deploy dev (starter tier, dev AWS account)
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/templates:/app/templates \
  -v $(pwd)/ml/reports:/app/reports \
  -e AWS_PROFILE=dev-profile \
  ml-provisioner:starter \
  -con techcorp-dev-a001-us-west-2-customer-churn-ml-codecommit.yaml \
  -act deploy-product --force

# Deploy staging (professional tier, staging AWS account)
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/templates:/app/templates \
  -v $(pwd)/ml/reports:/app/reports \
  -e AWS_PROFILE=staging-profile \
  ml-provisioner:professional \
  -con edge-staging-b001-us-west-2-fraud-detection-ml-codecommit.yaml \
  -act deploy-product --force

# Deploy prod (enterprise tier, prod AWS account)
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/templates:/app/templates \
  -v $(pwd)/ml/reports:/app/reports \
  -e AWS_PROFILE=prod-profile \
  ml-provisioner:enterprise \
  -con globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-standalone-ssm.yaml \
  -act deploy-product --force

Volume Mounts¶

Mount	Container Path	Purpose	Required For
`~/.aws`	`/home/mluser/.aws`	AWS credentials	All actions
`ml/configs`	`/app/configs`	Input configuration files	All actions
`ml/policies`	`/app/policies`	Generated IAM policies	`create-policy`
`ml/templates`	`/app/templates`	CloudFormation templates	`create-prov-template`, `validate-prov-template`, `create-review-report`, `deploy-product`, `show-changes`
`ml/reports`	`/app/reports`	Execution logs and HTML reports	All actions

Notes:

Mount configs/ and ~/.aws/ as read-only (:ro) — the tool never writes to these
Output directories (policies/, templates/, reports/) must be writable — do not use :ro

AWS Credentials¶

All actions require AWS credentials for subscription validation. Actions that interact with AWS infrastructure (show-changes, check-drift, test-deploy, deploy-product, delete-product) also require permissions for CloudFormation, SageMaker, CodePipeline, CodeBuild, IAM, S3, and SSM.

Option 1: AWS Profile (Recommended)

-v ~/.aws:/home/mluser/.aws:ro

Note: To target a specific named profile, add -e AWS_PROFILE=<profile-name> to the Docker command. Useful when targeting different AWS accounts per environment.

Option 2: Environment Variables

-e AWS_ACCESS_KEY_ID=<access_key> \
-e AWS_SECRET_ACCESS_KEY=<secret_key> \
-e AWS_DEFAULT_REGION=us-west-2

Option 3: IAM Role (when running on EC2/ECS)

# No credentials needed — uses instance role

Note: Most commonly used in enterprise CI/CD pipelines where the ML Provisioner runs from an EC2 instance or ECS task with an attached IAM role. Ensure the instance role has the permissions generated by create-policy.

Best Practices¶

Always validate first — Run validate-config before any AWS operations
Review the IAM policy — Run create-policy and attach the generated policy before deploying
Review the template — Run create-prov-template and inspect the generated CloudFormation template before deploying
Generate a review report — Run create-review-report and share with stakeholders before production deploy
Test before production — Use test-deploy to validate the full stack in isolation first
Preview changes — Run show-changes before re-deploying to an existing stack
Monitor drift — Run check-drift periodically to detect manual changes outside CloudFormation
Use IAM roles over access keys in production CI/CD pipelines
Version control configs — Store configuration files in Git for change tracking and rollback
Separate environments — Use separate config files and AWS accounts per environment

FAQ¶

Q: Can I modify the generated CloudFormation template? A: Yes, but changes will be overwritten on next create-prov-template. Use the YAML configuration to customise your setup instead.

Q: How do I upgrade to a new version? A: Pull the latest Docker image for your tier. Existing deployed stacks are not affected unless you redeploy.

Q: What happens if deployment fails? A: CloudFormation automatically rolls back all resources. Check the stack events and the log file in ml/reports/ for details. See Troubleshooting.

Q: Can I deploy to multiple regions? A: Yes — create separate configuration files for each region and run the tool for each config.

Q: How do I delete everything? A: Use delete-product --force to delete the CloudFormation stack and all associated resources.

Q: What is the workload field for? A: It allows multiple ML products in the same AWS account and region by appending a discriminator to resource names — avoiding naming collisions.

Q: Can I use an existing VPC? A: Yes — that is the enterprise tier use case. Set vpc_integration.mode to standalone or sgprov and provide your VPC ID either directly or via SSM Parameter Store.

Q: What SSM parameters does the tool publish after deployment? A: See README under What Gets Created for the full list of SSM parameter names per tier.

User Guide¶

Table of Contents¶

Pre-Deployment Checklist¶

Professional Tier (additional)¶

Enterprise Tier (additional)¶

Quick Reference — Actions by Safety Level¶

Local Actions (No AWS Calls besides subscription check)¶

Read-Only AWS Actions¶

Mutating AWS Actions (`--force` required)¶

Prerequisites¶

Configuration¶

Business Workflow¶

Scenario Matrix¶

Dimension Values¶

Representative Scenarios¶

Multi-Environment Deployment¶

Switching AWS Accounts¶

Volume Mounts¶

AWS Credentials¶

Best Practices¶

FAQ¶

User Guide¶

Table of Contents¶

Pre-Deployment Checklist¶

Professional Tier (additional)¶

Enterprise Tier (additional)¶

Quick Reference — Actions by Safety Level¶

Local Actions (No AWS Calls besides subscription check)¶

Read-Only AWS Actions¶

Mutating AWS Actions (--force required)¶

Prerequisites¶

Configuration¶

Business Workflow¶

Scenario Matrix¶

Dimension Values¶

Representative Scenarios¶

Multi-Environment Deployment¶

Switching AWS Accounts¶

Volume Mounts¶

AWS Credentials¶

Best Practices¶

FAQ¶

Mutating AWS Actions (`--force` required)¶