Migration GuideΒΆ

Guide for onboarding existing ML infrastructure to the ML Provisioner, and for upgrading between tiers.

Table of ContentsΒΆ


Migrating from Manual SetupΒΆ

The ML Provisioner does not import or manage existing AWS resources. Migration from a manually provisioned ML setup means standing up a new ML Provisioner-managed stack alongside the existing infrastructure, validating it, and cutting over.

Step 1: Inventory Existing ResourcesΒΆ

Collect the data elements needed to populate a configuration file. For each environment and scenario you intend to migrate:

  • AWS account ID and region

  • VPC ID, subnet IDs, and security group ID (enterprise tier)

  • SSM Parameter Store paths if VPC integration uses vpc_source: ssm

  • Use case and workload names β€” these will drive the ml_name and all resource names

  • Source control preference: CodeCommit or S3

Step 2: Create Configuration FilesΒΆ

Create one configuration file per environment and scenario. Use the naming convention described in Naming Conventions:

{company_prefix}-{env}-{tenant_id}-{region}-{use_case}-{workload}-ml-{scenario}.yaml

See Configuration Reference and Configuration Guide for all available fields.

Step 3: Validate and TestΒΆ

CONFIG=globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-sgprov-ssm.yaml
IMAGE=enterprise-1.0.0

# Validate configuration
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act validate-config

# Generate and review CloudFormation template
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/templates:/app/templates \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act create-prov-template

docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/templates:/app/templates \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act create-review-report

# Test deployment β€” catches naming conflicts and permission issues
docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act test-deploy

test-deploy creates an isolated stack with a random suffix. It will fail if there are naming conflicts with existing resources, misconfigured VPC references, or IAM permission gaps. Delete the test stack when done:

aws cloudformation delete-stack --stack-name <test-stack-name> --region us-west-2
aws cloudformation wait stack-delete-complete --stack-name <test-stack-name> --region us-west-2

Step 4: DeployΒΆ

Once test-deploy passes, follow the full provisioning sequence in User Guide to deploy the production stack.

Step 5: CutoverΒΆ

Cutover is outside the scope of the ML Provisioner. The MLOps team is responsible for:

  • Migrating pipeline definitions and model registry entries to the new stack

  • Updating downstream services to consume the new SSM Parameter Store paths

  • Decommissioning old resources once the new stack is fully validated


Upgrading TiersΒΆ

What Each Tier AddsΒΆ

Each higher tier is additive β€” it includes everything from the lower tier plus additional resources:

Resource

Starter

Professional

Enterprise

SageMaker Model Registry

βœ”

βœ”

βœ”

CodeCommit Repositories (Γ—2)

βœ”

βœ”

βœ”

CodeBuild Projects (Γ—2)

βœ”

βœ”

βœ”

CodePipeline Pipelines (Γ—2)

βœ”

βœ”

βœ”

IAM Roles (Γ—3)

βœ”

βœ”

βœ”

S3 Artifacts Bucket

βœ”

βœ”

βœ”

EventBridge Rule (model approval trigger)

βœ”

βœ”

CloudWatch Dashboard

βœ”

βœ”

IAM Managed Policies (Γ—2)

βœ”

βœ”

KMS Key + Alias

βœ”

IAM Permission Boundary

βœ”

CloudWatch Log Group (compliance)

βœ”

CloudWatch Alarms (Γ—2)

βœ”

SNS Topic + Subscription (alerts)

βœ”

VPC Endpoints (Γ—4)

βœ”

Endpoint Security Group

βœ” (standalone mode)

Tier Upgrade ProcedureΒΆ

Upgrading tiers requires deploying a new stack from the higher-tier Docker image. The existing lower-tier stack continues to run in parallel until cutover is complete.

Step 1: Create a new configuration file for the higher tier

Start from the existing lower-tier config as a base. Copy it and update the filename to reflect the new tier context:

cp ml/configs/globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-sgprov-ssm.yaml \
   ml/configs/globalbank-prod-c001-us-west-2-demand-forecasting-ent-ml-codecommit-sgprov-ssm.yaml

Step 2: Update the new config with tier-specific fields

The higher-tier Docker image requires additional config fields. For enterprise tier, add:

vpc_integration:
  mode: sg-provisioner          # or standalone
  vpc_source: ssm               # or direct
  vpc_parameter_store_path: /vpc/globalbank-prod/VpcId
  subnet_parameter_store_path: /vpc/globalbank-prod/SubnetIds
  sg_parameter_store_path: /sg/globalbank-prod/SecurityGroupId   # sg-provisioner mode only
  route_table_ids: []           # required for S3 Gateway endpoint

compliance:
  log_retention_days: 90

alerts:
  alerts_email: mlops-alerts@globalbank.com

See Configuration Reference for the full field reference per tier.

Step 3: Set a different workload value to avoid naming collisions

The ml_name is derived from config fields including workload. Since both the old and new stacks will coexist in the same AWS account and region, they must have different ml_name values to avoid resource naming conflicts. Use workload as the differentiator:

# Lower-tier config (existing)
workload: demand-forecasting

# Higher-tier config (new)
workload: demand-forecasting-ent

This produces distinct ml_name values:

  • globalbank-prod-c001-us-west-2-demand-forecasting-ml (existing stack)

  • globalbank-prod-c001-us-west-2-demand-forecasting-ent-ml (new stack)

And distinct SSM paths:

  • /ml/globalbank-prod-c001-us-west-2-demand-forecasting-ml/...

  • /ml/globalbank-prod-c001-us-west-2-demand-forecasting-ent-ml/...

Step 4: Validate and test

CONFIG=globalbank-prod-c001-us-west-2-demand-forecasting-ent-ml-codecommit-sgprov-ssm.yaml
IMAGE=enterprise-1.0.0

docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act validate-config

docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${IMAGE} -con ${CONFIG} -act test-deploy

Step 5: Deploy the higher-tier stack

Follow the full provisioning sequence in User Guide using the new config file and the higher-tier Docker image.

Avoiding Naming CollisionsΒΆ

The ml_name is constructed from:

{company_prefix}-{environment}-{tenant_id}-{region}-{use_case}-{workload}-ml

Tier is not part of the name. Two stacks with identical config fields but different tiers will have the same ml_name and collide. Always set a distinct workload value in the higher-tier config when running stacks in parallel.

After CutoverΒΆ

Once the higher-tier stack is fully validated and all downstream services have been updated to consume its SSM paths, decommission the lower-tier stack:

OLD_CONFIG=globalbank-prod-c001-us-west-2-demand-forecasting-ml-codecommit-sgprov-ssm.yaml
OLD_IMAGE=starter-1.0.0

docker run --rm \
  -v ~/.aws:/home/mluser/.aws:ro \
  -v $(pwd)/ml/configs:/app/configs:ro \
  -v $(pwd)/ml/reports:/app/reports \
  ml-provisioner:${OLD_IMAGE} -con ${OLD_CONFIG} -act delete-product --force

Warning: Verify no downstream services are still consuming the old stack’s SSM parameters before running delete-product. See Update Procedures for the full implications of stack deletion.


Region MigrationΒΆ

Deploying to a different AWS region is a new deployment, not a migration. The region is part of the ml_name and all resource names β€” there are no naming collisions with an existing stack in another region.

Create a new configuration file for the target region, follow the full provisioning sequence in User Guide, and manage cutover independently.