User GuideΒΆ
All commands are run from the mlops-infra-suite/ root directory.
Table of ContentsΒΆ
Pre-Deployment ChecklistΒΆ
Before deploying, ensure you have:
Docker 20.10+ installed and running (
docker --version)AWS credentials configured (
aws sts get-caller-identityworks)AWS Marketplace subscription active for ML Provisioner
IAM permissions verified (see IAM Permissions)
Working directories created:
ml/{configs,policies,templates,reports,docs}Configuration file copied from Docker image and adjusted (see README)
Reviewed generated CloudFormation template before deploying
Tested with
test-deploybefore production deploy
Professional Tier (additional)ΒΆ
SNS email subscription confirmation β
alerts_emailaddress will receive a confirmation email after first deploy
Enterprise Tier (additional)ΒΆ
VPC deployed and available before running
deploy-productFor
vpc_source: ssmβ SSM Parameter Store paths for VPC ID and subnet IDs populated (typically by vpc-provisioner)For
vpc_source: directβ VPC ID and subnet IDs set in the config fileFor
sgprovmode β SG Provisioner deployed and SG ID available in SSM
Quick Reference β Actions by Safety LevelΒΆ
Local Actions (No AWS Calls besides subscription check)ΒΆ
Action |
Description |
|---|---|
|
Validate configuration YAML against schema |
|
List available tier templates |
|
Show resources for the selected tier |
|
Generate least-privilege IAM policy |
|
Generate CloudFormation provisioning template |
|
Validate generated template locally |
|
Generate pre-deployment HTML review report |
Read-Only AWS ActionsΒΆ
Action |
Description |
|---|---|
|
Preview what would change in the deployed stack |
|
Detect infrastructure drift against deployed stack |
|
Deploy with test suffix for safe isolated testing |
Mutating AWS Actions (--force required)ΒΆ
Action |
Description |
|---|---|
|
Deploy ML product infrastructure via CloudFormation |
|
Delete CloudFormation stack and all associated resources |
PrerequisitesΒΆ
See README for full prerequisites and installation instructions.
ConfigurationΒΆ
For configuration file structure and field reference see Configuration Reference. For scenario-based guidance on selecting and populating the right config file see Configuration Guide.
Business WorkflowΒΆ
The commands below represent the complete lifecycle of an ML product deployment. Run them in order:
Steps 1β7 are local and require no AWS calls besides the subscription check
Steps 8β12 require AWS credentials and, for enterprise tier, the VPC must be deployed and available β see README for prerequisites
Scenario MatrixΒΆ
Dimension ValuesΒΆ
Dimension |
Value |
Meaning |
|---|---|---|
|
|
AWS CodeCommit repositories are created for model-build and model-deploy source code |
|
An existing S3 bucket is used as the pipeline source β no CodeCommit repositories are created |
|
|
|
ML Provisioner creates and manages its own endpoint Security Group |
|
Security Group is managed externally by SG Provisioner β ML Provisioner skips SG creation and reads the existing SG ID from SSM |
|
|
|
VPC ID and subnet IDs are resolved at deploy time from SSM Parameter Store paths β typically populated by VPC Provisioner |
|
VPC ID and subnet IDs are hardcoded directly in the configuration file |
|
|
empty |
No workload discriminator β |
|
Workload discriminator appended to |
|
|
empty |
No route table IDs β networking team manages S3 Gateway endpoint route associations manually |
populated |
Route table IDs provided β S3 Gateway endpoint route associations configured automatically at deploy time |
Config File |
Tier |
Image |
Source Control |
VPC Mode |
VPC Source |
Notes |
|---|---|---|---|---|---|---|
|
starter |
|
codecommit |
β |
β |
Representative starter scenario |
|
starter |
|
codecommit |
β |
β |
workload=realtime variant |
|
starter |
|
s3 |
β |
β |
Starter + S3 source |
|
starter |
|
s3 |
β |
β |
S3 + workload=realtime variant |
|
professional |
|
codecommit |
β |
β |
Representative professional scenario |
|
professional |
|
codecommit |
β |
β |
workload=realtime variant |
|
professional |
|
s3 |
β |
β |
Professional + S3 source |
|
professional |
|
s3 |
β |
β |
S3 + workload=realtime variant |
|
enterprise |
|
codecommit |
standalone |
ssm |
Representative enterprise scenario |
|
enterprise |
|
codecommit |
standalone |
direct |
|
|
enterprise |
|
codecommit |
sgprov |
ssm |
|
|
enterprise |
|
codecommit |
sgprov |
direct |
|
|
enterprise |
|
s3 |
standalone |
ssm |
|
|
enterprise |
|
s3 |
standalone |
direct |
|
|
enterprise |
|
s3 |
sgprov |
ssm |
|
|
enterprise |
|
s3 |
sgprov |
direct |
|
|
enterprise |
|
codecommit |
standalone |
ssm |
workload=realtime variant |
|
enterprise |
|
codecommit |
standalone |
direct |
route_table_ids populated variant |
Representative ScenariosΒΆ
Set variables CONFIG and IMAGE for your tier, then run the commands below.
Tier |
IMAGE |
CONFIG |
|---|---|---|
Starter |
|
|
Professional |
|
|
Enterprise |
|
|
Note β Enterprise prerequisite:
VPC deployed
SSM paths populated:
/vpc/globalbank-prod-c001-us-west-2-vpc/VPCId
/vpc/globalbank-prod-c001-us-west-2-vpc/PrivateSubnetIds
For example:
CONFIG=techcorp-prod-a001-us-west-2-customer-churn-ml-codecommit.yaml
IMAGE=starter
# 1. List available products
docker run --rm \
-v ~/.aws:/home/mluser/.aws:ro \
-v $(pwd)/ml/configs:/app/configs:ro \
-v $(pwd)/ml/reports:/app/reports \
ml-provisioner:${IMAGE} -con ${CONFIG} -act list-products
# 2. Show product resources for this tier
docker run --rm \
-v ~/.aws:/home/mluser/.aws:ro \
-v $(pwd)/ml/configs:/app/configs:ro \
-v $(pwd)/ml/reports:/app/reports \
ml-provisioner:${IMAGE} -con ${CONFIG} -act show-product
# 3. Validate configuration
docker run --rm \
-v ~/.aws:/home/mluser/.aws:ro \
-v $(pwd)/ml/configs:/app/configs:ro \
-v $(pwd)/ml/reports:/app/reports \
ml-provisioner:${IMAGE} -con ${CONFIG} -act validate-config
# 4. Generate IAM policy
docker run --rm \
-v ~/.aws:/home/mluser/.aws:ro \
-v $(pwd)/ml/configs:/app/configs:ro \
-v $(pwd)/ml/policies:/app/policies \
-v $(pwd)/ml/reports:/app/reports \
ml-provisioner:${IMAGE} -con ${CONFIG} -act create-policy
# 5. Generate CloudFormation template
docker run --rm \
-v ~/.aws:/home/mluser/.aws:ro \
-v $(pwd)/ml/configs:/app/configs:ro \
-v $(pwd)/ml/templates:/app/templates \
-v $(pwd)/ml/reports:/app/reports \
ml-provisioner:${IMAGE} -con ${CONFIG} -act create-prov-template
# 6. Validate generated template
docker run --rm \
-v ~/.aws:/home/mluser/.aws:ro \
-v $(pwd)/ml/configs:/app/configs:ro \
-v $(pwd)/ml/templates:/app/templates \
-v $(pwd)/ml/reports:/app/reports \
ml-provisioner:${IMAGE} -con ${CONFIG} -act validate-prov-template
# 7. Generate pre-deployment review report
docker run --rm \
-v ~/.aws:/home/mluser/.aws:ro \
-v $(pwd)/ml/configs:/app/configs:ro \
-v $(pwd)/ml/templates:/app/templates \
-v $(pwd)/ml/reports:/app/reports \
ml-provisioner:${IMAGE} -con ${CONFIG} -act create-review-report
# 8. Preview changes (requires deployed stack)
docker run --rm \
-v ~/.aws:/home/mluser/.aws:ro \
-v $(pwd)/ml/configs:/app/configs:ro \
-v $(pwd)/ml/templates:/app/templates \
-v $(pwd)/ml/reports:/app/reports \
ml-provisioner:${IMAGE} -con ${CONFIG} -act show-changes
# 9. Check drift (requires deployed stack)
docker run --rm \
-v ~/.aws:/home/mluser/.aws:ro \
-v $(pwd)/ml/configs:/app/configs:ro \
-v $(pwd)/ml/reports:/app/reports \
ml-provisioner:${IMAGE} -con ${CONFIG} -act check-drift
# 10. Test deploy β stack name printed upon completion, note it down
docker run --rm \
-v ~/.aws:/home/mluser/.aws:ro \
-v $(pwd)/ml/configs:/app/configs:ro \
-v $(pwd)/ml/reports:/app/reports \
ml-provisioner:${IMAGE} -con ${CONFIG} -act test-deploy
# Delete the test stack once verified (replace <test-stack-name> with the name printed above)
aws cloudformation delete-stack --stack-name <test-stack-name>
aws cloudformation wait stack-delete-complete --stack-name <test-stack-name>
# 11. Deploy product
docker run --rm \
-v ~/.aws:/home/mluser/.aws:ro \
-v $(pwd)/ml/configs:/app/configs:ro \
-v $(pwd)/ml/templates:/app/templates \
-v $(pwd)/ml/reports:/app/reports \
ml-provisioner:${IMAGE} -con ${CONFIG} -act deploy-product --force
# 12. Delete product
docker run --rm \
-v ~/.aws:/home/mluser/.aws:ro \
-v $(pwd)/ml/configs:/app/configs:ro \
-v $(pwd)/ml/reports:/app/reports \
ml-provisioner:${IMAGE} -con ${CONFIG} -act delete-product --force
Multi-Environment DeploymentΒΆ
The recommended pattern is to deploy a different tier per environment, each in its own AWS account, following the AWS Well-Architected best practice of account-per-environment isolation.
Environment |
Tier |
Image |
|---|---|---|
dev |
starter |
|
staging |
professional |
|
prod |
enterprise |
|
Each environment requires a separate config file. See Application Architecture for architecture details.
Switching AWS AccountsΒΆ
Each Docker command mounts ~/.aws from the local machine. To target a different AWS account
per environment, set the AWS_PROFILE environment variable in the Docker run command:
# Deploy dev (starter tier, dev AWS account)
docker run --rm \
-v ~/.aws:/home/mluser/.aws:ro \
-v $(pwd)/ml/configs:/app/configs:ro \
-v $(pwd)/ml/templates:/app/templates \
-v $(pwd)/ml/reports:/app/reports \
-e AWS_PROFILE=dev-profile \
ml-provisioner:starter \
-con techcorp-dev-a001-us-west-2-customer-churn-ml-codecommit.yaml \
-act deploy-product --force
# Deploy staging (professional tier, staging AWS account)
docker run --rm \
-v ~/.aws:/home/mluser/.aws:ro \
-v $(pwd)/ml/configs:/app/configs:ro \
-v $(pwd)/ml/templates:/app/templates \
-v $(pwd)/ml/reports:/app/reports \
-e AWS_PROFILE=staging-profile \
ml-provisioner:professional \
-con edge-staging-b001-us-west-2-fraud-detection-ml-codecommit.yaml \
-act deploy-product --force
# Deploy prod (enterprise tier, prod AWS account)
docker run --rm \
-v ~/.aws:/home/mluser/.aws:ro \
-v $(pwd)/ml/configs:/app/configs:ro \
-v $(pwd)/ml/templates:/app/templates \
-v $(pwd)/ml/reports:/app/reports \
-e AWS_PROFILE=prod-profile \
ml-provisioner:enterprise \
-con globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-standalone-ssm.yaml \
-act deploy-product --force
Volume MountsΒΆ
Mount |
Container Path |
Purpose |
Required For |
|---|---|---|---|
|
|
AWS credentials |
All actions |
|
|
Input configuration files |
All actions |
|
|
Generated IAM policies |
|
|
|
CloudFormation templates |
|
|
|
Execution logs and HTML reports |
All actions |
Notes:
Mount
configs/and~/.aws/as read-only (:ro) β the tool never writes to theseOutput directories (
policies/,templates/,reports/) must be writable β do not use:ro
AWS CredentialsΒΆ
All actions require AWS credentials for subscription validation. Actions that interact with AWS infrastructure (show-changes, check-drift, test-deploy, deploy-product, delete-product) also require permissions for CloudFormation, SageMaker, CodePipeline, CodeBuild, IAM, S3, and SSM.
Option 1: AWS Profile (Recommended)
-v ~/.aws:/home/mluser/.aws:ro
Note: To target a specific named profile, add
-e AWS_PROFILE=<profile-name>to the Docker command. Useful when targeting different AWS accounts per environment.
Option 2: Environment Variables
-e AWS_ACCESS_KEY_ID=<access_key> \
-e AWS_SECRET_ACCESS_KEY=<secret_key> \
-e AWS_DEFAULT_REGION=us-west-2
Option 3: IAM Role (when running on EC2/ECS)
# No credentials needed β uses instance role
Note: Most commonly used in enterprise CI/CD pipelines where the ML Provisioner runs from an EC2 instance or ECS task with an attached IAM role. Ensure the instance role has the permissions generated by
create-policy.
Best PracticesΒΆ
Always validate first β Run
validate-configbefore any AWS operationsReview the IAM policy β Run
create-policyand attach the generated policy before deployingReview the template β Run
create-prov-templateand inspect the generated CloudFormation template before deployingGenerate a review report β Run
create-review-reportand share with stakeholders before production deployTest before production β Use
test-deployto validate the full stack in isolation firstPreview changes β Run
show-changesbefore re-deploying to an existing stackMonitor drift β Run
check-driftperiodically to detect manual changes outside CloudFormationUse IAM roles over access keys in production CI/CD pipelines
Version control configs β Store configuration files in Git for change tracking and rollback
Separate environments β Use separate config files and AWS accounts per environment
FAQΒΆ
Q: Can I modify the generated CloudFormation template?
A: Yes, but changes will be overwritten on next create-prov-template. Use the YAML configuration to customise your setup instead.
Q: How do I upgrade to a new version? A: Pull the latest Docker image for your tier. Existing deployed stacks are not affected unless you redeploy.
Q: What happens if deployment fails?
A: CloudFormation automatically rolls back all resources. Check the stack events and the log file in ml/reports/ for details. See Troubleshooting.
Q: Can I deploy to multiple regions? A: Yes β create separate configuration files for each region and run the tool for each config.
Q: How do I delete everything?
A: Use delete-product --force to delete the CloudFormation stack and all associated resources.
Q: What is the workload field for?
A: It allows multiple ML products in the same AWS account and region by appending a discriminator to resource names β avoiding naming collisions.
Q: Can I use an existing VPC?
A: Yes β that is the enterprise tier use case. Set vpc_integration.mode to standalone or sgprov and provide your VPC ID either directly or via SSM Parameter Store.
Q: What SSM parameters does the tool publish after deployment? A: See README under What Gets Created for the full list of SSM parameter names per tier.