Resources - Detailed explanation of purpose

AWS::SageMaker::ModelPackageGroup

The AWS::SageMaker::ModelPackageGroup is the CloudFormation resource that acts as a centralized repository for managing different versions of a machine learning model. It serves as the “Model Registry” in AWS, creating a strict, production-ready handshake between your model building (CI) and model deployment (CD) pipelines.

Here is a detailed breakdown of how it functions as the logical bridge in an MLOps architecture.

The Logical Handshake: Connecting Build and Deploy

In mature MLOps, training and deployment are entirely separate pipelines. The Model Registry acts as the data contract between them.

  • The Build Pipeline (CI): Trains a model, evaluates its performance, generates a model artifact (model.tar.gz), and registers it by creating a new ModelPackage version inside the ModelPackageGroup

  • The Deploy Pipeline (CD): Polls or listens for changes in the ModelPackageGroup. It only triggers when a new version is registered and meets specific criteria. It never looks at raw training code or unverified artifacts.

The Approval Gate (The Security Boundary)

The registry provides a built-in governance mechanism using the ModelApprovalStatus property. This property acts as a gatekeeper for production environments:

  • PendingManualApproval: The default state when a new model version is registered. Deploy pipelines are typically configured to ignore versions in this state.

  • Approved: A data scientist, manager, or automated testing suite manually or programmatically flips the status to approved. This status change triggers an event (via Amazon EventBridge).

  • Rejected: The model failed to meet business or performance metrics. The deploy pipeline blocks it, ensuring it never touches production.

Version Management

Instead of overwriting artifacts or tracking models via disorganized S3 folder names (e.g., model_v2_final_latest), the ModelPackageGroup enforces sequential immutable versioning.

  • Each time a model is registered, it receives an automatic incremented version number (e.g., 1, 2, 3).

  • Every version tracks its own unique S3 artifact URI, container image location, and hyperparameters.

  • Rollbacks become instantaneous: if version 5 fails in production, the deploy pipeline can immediately point back to the package registry for version 4

Traceability and Lineage

Auditing machine learning models is a strict regulatory requirement in many industries. Every ModelPackage registered under a group retains full lineage tracking:

  • Data Lineage: Links directly to the exact Amazon S3 location of the training and validation datasets used.

  • Code Lineage: Tracks the specific training container image and algorithm details.

  • Metrics Lineage: Stores validation metrics (like accuracy, F1 score, or ROC curves) directly inside the registry metadata, allowing teams to compare versions objectively before approving them.

In the ML Provisioner

  • Tier: Starter, Professional, Enterprise (all tiers)

  • Count: 1 per stack

  • Role: Serves as the handshake point between the Build pipeline (CI) and the Deploy pipeline (CD). The Build pipeline registers trained model versions here. The Deploy pipeline listens for approval status changes via EventBridge (Professional/Enterprise) or polls directly (Starter). No model reaches deployment without passing through this registry.

  • SSM output: /ml/{product-name}/ModelPackageGroupArn

AWS::CodeCommit::Repository

The AWS::CodeCommit::Repository resource is an AWS CloudFormation resource used to automatically provision and configure private, hosted Git repositories directly within the Amazon Web Services ecosystem. Instead of manually navigating the AWS Management Console, developers use this infrastructure-as-code (IaC) resource to spin up secure source control environments dynamically.

Here are the key roles and purposes:

Cloud-Based Git Hosting

It provisions a fully managed version control repository that supports all standard Git commands and workflows. It operates similarly to platforms like GitHub or GitLab but runs entirely within your AWS architecture.

Native AWS Ecosystem Integration

Unlike third-party Git hosts, a repository created via this resource seamlessly links with internal AWS services:

  • CI/CD Pipelines: Automatically triggers automated software release workflows inside AWS CodePipeline whenever code is pushed.

  • Event-Driven Actions: Uses the Triggers property to invoke AWS Lambda functions or send alerts via Amazon SNS on specific events like pull requests or branch updates.

Identity and Access Management (IAM)

Authentication and repository permissions are controlled natively through standard AWS IAM users, roles, and policies. You do not need to manage distinct SSH keys or external user credential databases for your developers.

Scalability and Security Management

  • Data Protection: The KmsKeyId property permits you to link custom AWS Key Management Service (KMS) keys to enforce server-side encryption at rest.

  • No Infrastructure Overhead: The underlying storage scales dynamically behind the scenes to handle massive files and lengthy revision histories without any disk space configuration required.

In the ML Provisioner

  • Tier: Starter, Professional, Enterprise (all tiers)

  • Count: 2 per stack — model-build and model-deploy

  • Role: Provides the source control foundation for the two ML pipelines. The model-build repo holds training and evaluation code; the model-deploy repo holds deployment and inference code. Each repo is the source stage trigger for its corresponding CodePipeline pipeline.

  • Fallback: When source_control: s3 is set in the client config, no CodeCommit repositories are created. An S3 prefix is used as the pipeline source stage instead (see Design Decision 3).

  • SSM output: /ml/{product-name}/RepositoryUrl

AWS::CodeBuild::Project

AWS::CodeBuild::Project is a CloudFormation resource that automates the creation and configuration of an AWS CodeBuild project.

AWS CodeBuild is a fully managed build service. It compiles source code, runs tests, and produces ready-to-deploy software packages. Using this CloudFormation resource allows you to define your build infrastructure as code.

Core purposes

  • Automates Build Environments: Eliminates manual setup of build servers by provisioning transient, isolated Docker containers for every build.

  • Standardizes Build Pipelines: Ensures every code compilation and test execution runs in an identical, predictable environment.

  • Enforces Security: Integrates with AWS Identity and Access Management (IAM) to strictly control access to source code and build artifacts.

  • Scales Automatically: Scales up and down automatically to handle peak build volumes without server management.

Key Components Configured by the Resource

  • Source: Specifies where the code lives, such as AWS CodeCommit, GitHub, Bitbucket, or Amazon S3.

  • Environment: Defines the operating system, programming language runtime, Docker image, and compute size (CPU/Memory) used for the build.

  • ServiceRole: Points to the IAM role that grants CodeBuild permission to interact with other AWS services like Amazon S3 or Amazon ECR.

  • Artifacts: Defines where the completed build outputs (e.g., .jar, .zip, Docker images) are sent, typically an Amazon S3 bucket.

  • BuildSpec: Contains the exact sequence of commands and phases (install, pre-build, build, post-build) CodeBuild executes. This can be embedded directly in the CloudFormation template or placed in the root of the source code repository as a buildspec.yml file.

In the ML Provisioner

  • Tier: Starter, Professional, Enterprise (all tiers)

  • Count: 2 per stack — build and deploy

  • Role: Executes the ML pipeline stages. The build project runs model training, evaluation, and registration into the Model Registry. The deploy project runs deployment and inference endpoint configuration. Both projects use aws/codebuild/standard:7.0 and a buildspec.yml at the repo root.

  • ServiceRole: Each project is assigned the codebuild-role IAM role provisioned in the same stack.

AWS::CodePipeline::Pipeline

AWS::CodePipeline::Pipeline is a CloudFormation resource used to automate your software release processes. It allows you to define, configure, and deploy continuous integration and continuous delivery (CI/CD) pipelines as code.

Core Purposes

  • Automate Software Releases: Eliminates manual steps by automatically triggering workflow sequences from code check-in to production deployment.

  • Enforce Workflow Structure: Defines the exact sequence of stages (e.g., Source, Build, Test, Deploy) that code changes must pass through.

  • Manage Artifact Flow: Handles the secure transfer of input and output files (artifacts) between different lifecycle stages via Amazon S3.

  • Integrate AWS Services: Connects natively with AWS services like CodeCommit, CodeBuild, CodeDeploy, ECS, Lambda, and S3.

  • Incorporate Third-Party Tools: Integrates external DevOps tools like GitHub, Jenkins, and SonarQube directly into the release flow.

  • Control Release Gates: Supports manual approval actions to halt the pipeline until a human reviews and approves the changes.

  • Implement Infrastructure as Code: Enables version control, auditing, and replication of your CI/CD architecture across multiple AWS accounts.

Essential Structure

A pipeline resource is built using three primary components:

  • Stages: Sequential logical divisions (e.g., “BuildStage”, “ProdDeployment”) that contain one or more actions.

  • Actions: Specific tasks executed within a stage, such as pulling code, running a build script, or deploying a container.

  • ArtifactStores: Amazon S3 buckets used to store the intermediate files passed between the pipeline actions.

In the ML Provisioner

  • Tier: Starter, Professional, Enterprise (all tiers)

  • Count: 2 per stack — build-pipeline and deploy-pipeline

  • Role: Orchestrates the end-to-end ML workflow. The build-pipeline moves code from the source repo through the CodeBuild build project and into the Model Registry. The deploy-pipeline moves an approved model version from the Model Registry through the CodeBuild deploy project to the target environment. In Professional and Enterprise tiers, the deploy pipeline is triggered automatically via EventBridge Pipe on model approval.

  • ArtifactStore: Uses the AWS::S3::Bucket provisioned in the same stack.

  • SSM outputs: /ml/{product-name}/BuildPipelineArn, /ml/{product-name}/DeployPipelineArn

AWS::IAM::Role

AWS::IAM::Role resource in AWS CloudFormation creates an Identity and Access Management (IAM) role. This identity defines a set of permissions for making AWS service requests without using permanent credentials.

Primary Purposes

  • Grants Temporary Credentials: It provides temporary security tokens through the AWS Security Token Service (STS).

  • Delegates Access: It allows users, applications, or AWS services to access resources across different AWS accounts.

  • Secures AWS Services: It permits services like Amazon EC2 or AWS Lambda to securely interact with other resources like Amazon S3 or DynamoDB.

  • Enables Federated Identity: It allows external users (via SAML or OpenID Connect) to log into the AWS Management Console or access APIs.

Core Components Defined in CloudFormation

  • AssumeRolePolicyDocument: This mandatory trust policy defines who or what is allowed to assume the role.

  • Policies: These inline policies define what actions the identity can perform on specific resources.

  • ManagedPolicyArns: These attach pre-defined AWS-managed or customer-managed policies to the role.

  • PermissionsBoundary: This sets the maximum permissions limit that the role can ever possess.

Common Use Cases

  • EC2 Instance Profiles: Attaching a role to an EC2 instance so applications running on it can access S3 buckets safely.

  • Lambda Execution: Giving a Lambda function the exact permissions needed to write logs to CloudWatch and read from a database.

  • Cross-Account Access: Allowing a deployment tool in a “Dev” AWS account to update resources inside a “Prod” AWS account.

In the ML Provisioner

  • Tier: Starter (x3), Professional (x4), Enterprise (x4)

  • Roles provisioned:

    • codebuild-role — grants CodeBuild projects access to S3, CodeCommit, SageMaker, and CloudWatch Logs

    • codepipeline-role — grants CodePipeline permission to trigger CodeBuild and pass artifacts via S3

    • sagemaker-execution-role — grants SageMaker permission to access S3, ECR, and CloudWatch for training and inference jobs. ARN stored in SSM at /ml/{product-name}/SageMakerRoleArn

    • pipe-execution-role (Professional/Enterprise) — least-privilege role allowing the EventBridge Pipe to trigger the Deploy pipeline only

AWS::SSM::Parameter

AWS::SSM::Parameter is an AWS CloudFormation resource used to create and manage configuration data and secrets in AWS Systems Manager (SSM) Parameter Store.

Core Purposes

  • Centralized Configuration: Stores application data, URLs, and settings in one place.

  • Separation of Code and Data: Keeps environment-specific values out of your application code.

  • Secret Management: Secures sensitive data like API keys and passwords using encryption.

  • Stack Decoupling: Shares configuration data safely across different CloudFormation stacks.

  • Dynamic Referencing: Allows other CloudFormation templates to fetch values dynamically during deployment.

In the ML Provisioner

  • Tier: Starter (x3), Professional (x5), Enterprise (x9)

  • Role: Stores resource ARNs and identifiers at deployment time for consumption by downstream provisioners and automation — no secrets are stored.

  • Parameters written per tier:

    • Starter: ModelPackageGroupArn, RepositoryUrl

    • Professional adds: BucketName, DashboardName

    • Enterprise adds: KmsKeyArn, LogGroupName, VpcEndpointIds, SecurityGroupId

  • Path convention: /ml/{product-name}/{ParameterName} — e.g., /ml/globalbank-prod-c001-us-west-2-fraud-detection-ml/ModelPackageGroupArn

  • Primary consumer: SageMaker Provisioner (next module), which reads KmsKeyArn, SageMakerRoleArn, and BucketName to configure Studio domains without manual cross-referencing.

AWS::Events::Rule

  • String: Stores plain text data, such as database names or environment identifiers.

  • StringList: Stores comma-separated values, useful for lists of subnets or security groups.

  • SecureString: Encrypts sensitive data using AWS Key Management Service (KMS) keys.

  • Hierarchical Paths: Organizes parameters using a folder-like structure (e.g., /prod/db/username).

Common Use Cases

  • Passing Environment Variables: Injecting runtime parameters into AWS Lambda functions or Amazon ECS tasks.

  • Cross-Stack References: Writing a resource ID to Parameter Store in an infrastructure stack so an application stack can read it.

  • Automation Pipelines: Storing AMI IDs that change frequently for Auto Scaling groups to query during updates.

AWS::Events::Rule

AWS::Events::Rule creates an Amazon EventBridge (CloudWatch Events) rule to route system events to targets. It monitors changes in your AWS environment or schedules recurring tasks, then triggers automated responses.

Primary Purposes

  • Event-Driven Automation: Detects real-time changes in AWS resources (e.g., an EC2 instance changing state) and automatically triggers actions.

  • Cron and Rate Scheduling: Acts as a serverless cron job to trigger tasks at specific times or intervals (e.g., running a Lambda function every night).

  • Decoupled Architecture: Connects event sources to downstream processing targets without hardcoding dependencies between them.

  • Custom Application Routing: Routes custom events published by your own applications to the correct processing microservices.

  • SaaS Integration: Ingests and routes events from third-party SaaS applications (like Zendesk or Datadog) into AWS.

Key Capabilities

  • Pattern Matching: Filters complex JSON event payloads using precise rules to match specific fields, fields prefixes, or numeric ranges.

  • Multi-Target Routing: Sends a single matched event to up to five different targets simultaneously.

  • Input Transformation: Redacts, reshapes, or extracts specific data from the event JSON before delivering it to the target.

Common Targets Supported

  • AWS Lambda functions

  • Amazon SNS topics and SQS queues

  • Amazon ECS tasks

  • AWS Step Functions state machines

  • API Destinations (external HTTP endpoints)

In the ML Provisioner

  • Tier: Professional, Enterprise

  • Count: 1 per stack

  • Role: Monitors the Model Registry for model approval status changes. When a model version transitions to Approved, this rule fires and triggers the Deploy pipeline directly as a Rule target. No intermediate resource (Lambda or Pipe) is required — EventBridge Rules natively support CodePipeline as a target.

  • Event pattern: Matches aws.sagemaker events of type SageMaker Model Package State Change where ModelApprovalStatus is Approved.

  • Target: The deploy-pipeline CodePipeline pipeline, invoked directly via its ARN.

  • Target role: The codepipeline-role IAM role provisioned in the same stack.

AWS::CloudWatch::Dashboard

The AWS::CloudWatch::Dashboard resource in AWS CloudFormation is used to automate the creation, deployment, and management of customized telemetry dashboards in Amazon CloudWatch. It allows you to visualize infrastructure health, application performance, and operational metrics as code.

Core Purposes

  • Infrastructure as Code (IaC): Defines operational dashboards inside CloudFormation templates to spin them up automatically alongside your resources.

  • Centralized Monitoring: Combines metrics, logs, and alarms from multiple AWS resources and regions into a single, cohesive visual interface.

  • Team Standardization: Deploys identical, pre-configured monitoring layouts across different AWS accounts (e.g., Development, Staging, Production).

  • Faster Incident Response: Provides DevOps and SRE teams with immediate visual context during system outages or performance degradation.

  • Resource Lifecycles: Deletes or updates monitoring views automatically when the underlying application stack is modified or torn down.

Key Capabilities

  • Dynamic Customization: Supports various visualization widgets including line charts, stacked area graphs, single-value stats, alarm statuses, and markdown text blocks.

  • Log Insights Integration: Embeds CloudWatch Logs Insights query results directly into the visual grid to view structured log data next to performance charts.

  • Flexible Layouts: Uses a coordinate grid system ((X, Y) axes with height and width parameters) to precisely control widget positioning and sizing.

  • Variable Timelines: Allows users to change time horizons globally or set specific time frames per individual widget.

In the ML Provisioner

  • Tier: Professional, Enterprise

  • Count: 1 per stack

  • Role: Provides a pre-built monitoring view for the ML pipeline stack. Surfaces key operational metrics — pipeline execution status, CodeBuild durations, and model approval events — in a single pane without requiring manual dashboard setup.

  • SSM output: /ml/{product-name}/DashboardName

Basic Syntax Example

MyCloudWatchDashboard:
  Type: AWS::CloudWatch::Dashboard
  Properties:
    DashboardName: Production-Overview-Dashboard
    DashboardBody: '{"widgets":[{"type":"metric","x":0,"y":0,"width":12,"height":6,"properties":{"metrics":[["AWS/EC2","CPUUtilization","InstanceId","i-0123456789abcdef0"]],"period":300,"stat":"Average","region":"us-east-1","title":"EC2 CPU Utilization"}}]}'

AWS::IAM::ManagedPolicy

The AWS::IAM::ManagedPolicy resource in AWS CloudFormation creates a standalone Identity and Access Management (IAM) policy that you can attach to multiple users, groups, or roles.

Primary Purposes

  • Centralized Permission Management: Define a security policy once and apply it to multiple IAM identities.

  • Reusability: Shared across many users, groups, and roles to avoid duplicating JSON code.

  • Automatic Updates: Changing the managed policy automatically updates permissions for all attached identities.

  • Version Control: AWS automatically retains historical versions of the policy, allowing quick rollbacks if errors occur.

  • Granular Access Control: Enforces least-privilege access by explicitly defining allowed or denied AWS actions and resources.

Key Use Cases

  • Job Function Policies: Creating standard profiles like ReadOnlyAccess or DatabaseAdministrator for specific team roles.

  • Compliance Enforcement: Applying mandatory security baselines across all development teams.

  • Cross-Account Roles: Managing permissions for external or cross-account access securely

In the ML Provisioner

  • Tier: Professional (x2), Enterprise (x3)

  • Role: Provides reusable, versioned permission sets attached to IAM roles in the stack.

    • Professional: enhanced-build-policy and enhanced-deploy-policy — extend the base CodeBuild and CodePipeline roles with additional permissions required for monitoring and artifact management

    • Enterprise adds: permission-boundary — sets the maximum permissions ceiling for all IAM roles in the stack, enforcing least-privilege at the boundary level regardless of what policies are attached

AWS::KMS::Key

AWS::KMS::Key creates and manages a logical AWS Key Management Service (KMS) key. This CloudFormation resource acts as a root of trust. It provides secure cryptographic keys used to encrypt and decrypt data across your AWS ecosystem.

Primary Purposes

  • Data Encryption: Generates the master keys needed to encrypt data at rest.

  • Access Control: Attaches key policies to define exactly who can use or manage the key.

  • Cryptographic Agility: Supports symmetric, asymmetric, and Hash-Based Message Authentication Code (HMAC) keys.

  • Regulatory Compliance: Provides cryptographic erasure capabilities and enforces automatic annual key rotation.

  • AWS Service Integration: Acts as the security backbone for services like Amazon S3, EBS, and DynamoDB.

Core Configurations and Features

  • Multi-Region Replication: Can be configured as a primary or replica key to decrypt data across different global AWS regions.

  • Key Spec Selection: Allows choice between standard symmetric encryption (AES-256) and asymmetric signing/encryption (RSA or Elliptic Curve).

  • Automatic Rotation: Offers a simple boolean toggle to automatically rotate the underlying cryptographic material every year.

  • Deletion Protection: Includes a mandatory waiting period (7 to 30 days) before a key is permanently destroyed to prevent accidental data loss.

Typical Use Case Architecture

[ Your Application / AWS Service ] 
         │
         │ Request Encrypt/Decrypt
         ▼
┌────────────────────────────────────────┐
│             AWS::KMS::Key              │
│  ├── Cryptographic Material (Secret)   │
│  └── Key Policy (Who has access?)      │
└────────────────────────────────────────┘

In the ML Provisioner

  • Tier: Enterprise

  • Count: 1 per stack

  • Role: Provides the Customer Managed Key (CMK) for encryption at rest across the stack. Applied to two resources: (1) the AWS::S3::Bucket artifacts bucket via BucketEncryption, and (2) the CodePipeline artifact store via artifactStore.encryptionKey. The key ARN is also stored in SSM for consumption by the SageMaker Provisioner, which applies it to SageMaker Studio EBS volumes.

  • SSM output: /ml/{product-name}/KmsKeyArn

AWS::KMS::Alias

An AWS::KMS::Alias resource in AWS CloudFormation creates a display name for an AWS Key Management Service (KMS) customer managed key.

Primary Purposes

  • Simplifies Key Rotation: Code can reference the alias instead of a specific Key ID. When replacing a key, you simply point the alias to the new Key ID without changing your application code.

  • Improves Code Readability: Substitutes long, random alphanumeric strings with human-readable names (e.g., alias/MyApplicationKey).

  • Abstracts Environments: Allows identical application code or templates to run across multiple environments (Dev, Test, Prod) by using the same alias name pointing to different underlying keys in each account.

  • Controls Access Safely: IAM policies can grant permissions based on the alias name rather than the specific Key ARN, making permission management more flexible.

Key Characteristics & Constraints

  • Prefix Requirement: Every alias name must begin with the prefix alias/.

  • Reserved Names: The prefix alias/aws/ is strictly reserved for AWS managed keys and cannot be used.

  • One-to-One Mapping: An alias can only point to one KMS key at a time, though multiple aliases can point to the same KMS key.

  • Regional Scope: Aliases are regional resources and must reside in the same region as the KMS key they reference.

In the ML Provisioner

  • Tier: Enterprise

  • Count: 1 per stack

  • Role: Provides a human-readable reference to the KMS key provisioned in the same stack. Allows other services and team members to reference the key by name (e.g., alias/globalbank-prod-c001-fraud-detection-ml-key) rather than by the raw key ARN.

AWS::Logs::LogGroup

AWS::Logs::LogGroup defines a logical grouping of log streams in Amazon CloudWatch Logs. It acts as the primary administrative blueprint for organizing, retaining, and securing system and application logs within an AWS CloudFormation template.

Core Purposes

  • Logical Organization: Group related log streams sharing the same retention, encryption, and access controls.

  • Retention Management: Define how many days AWS keeps your log data before automatic deletion.

  • Security & Encryption: Associate AWS KMS keys to encrypt sensitive log data at rest.

  • Lifecycle Management: Automate the creation, updates, and deletion of log destinations via infrastructure as code.

  • Access Control: Provide a specific Amazon Resource Name (ARN) to attach precise IAM policies.

  • Downstream Integration: Act as the mandatory source for metric filters, subscription filters, and anomaly detection.

Key Properties in CloudFormation

  • LogGroupName: Specifies a custom name; omitting it triggers an automatic, unique physical ID.

  • RetentionInDays: Controls storage costs by setting a lifespan from 1 day to 10 years (or infinite).

  • KmsKeyId: Points to a customer managed key for strict data compliance environments.

  • Tags: Applies metadata keys and values for cost allocation tracking across environments

In the ML Provisioner

  • Tier: Enterprise

  • Count: 1 per stack

  • Role: Captures security-relevant events for the ML stack — unauthorized API calls and root account usage. Retention is enforced at a minimum of 90 days by the security validator (generation is blocked if set lower). Acts as the compliance audit trail required by enterprise security policies.

  • SSM output: /ml/{product-name}/LogGroupName

AWS::CloudWatch::Alarm

AWS::CloudWatch::Alarm is a CloudFormation resource used to automate monitoring and trigger actions based on metric thresholds.

Core Purposes

  • Automate Incident Response: It triggers actions like Amazon SNS notifications, EC2 Auto Scaling policies, or Systems Manager OpsItems when metrics cross defined limits.

  • Monitor Resource Health: It tracks the performance of AWS resources (like CPU utilization or database connections) or custom application metrics.

  • Enable Self-Healing Infrastructure: It can automatically recover failed EC2 instances or scale application clusters up and down based on real-time demand.

  • Manage Costs: It monitors billing metrics to alert you when your AWS spend exceeds a specific budget threshold.

Key Evaluation Types

  • Metric Alarms: Watch a single CloudWatch metric or a math expression based on multiple metrics over a specific time period.

  • Anomaly Detection Alarms: Monitor metrics against expected baseline patterns rather than static thresholds, reducing false positives for cyclical workloads.

Common Use Cases

  • Auto Scaling: Scaling an ECS service when average memory utilization stays above 80% for 5 minutes.

  • System Recovery: Rebooting an EC2 instance automatically if it fails its system status checks.

  • Application Operations: Notifying an engineering team via PagerDuty (via SNS) if an API gateway returns a high rate of 5XX server errors.

In the ML Provisioner

  • Tier: Enterprise

  • Count: 2 per stack — unauthorized-api-calls and root-account-usage

  • Role: Monitors the CloudWatch Log Group for security violation patterns. When a violation is detected, the alarm transitions to ALARM state and publishes a notification to the AWS::SNS::Topic provisioned in the same stack, triggering the alert subscription chain.

AWS::SNS::Topic

An AWS::SNS::Topic resource creates a logical access point and communication channel in Amazon Simple Notification Service (SNS). It acts as a centralized hub for broadcasting asynchronous messages to multiple subscribing endpoints simultaneously.

Primary Purposes

  • Pub/Sub Messaging: Decouples message senders (publishers) from message receivers (subscribers). Senders emit messages to the topic without knowing who will consume them.

  • Event Fan-Out: Delivers a single published message to thousands of diverse subscribing endpoints at the exact same time.

  • System Decoupling: Isolates microservices so they can scale, fail, and update independently without breaking the entire application architecture.

Core Capabilities

  • Diverse Subscriptions: Fans out messages to Amazon SQS queues, AWS Lambda functions, HTTP/S webhooks, email addresses, and SMS mobile numbers.

  • Message Filtering: Allows subscribers to assign a filter policy to their subscription. Subscribers only receive messages matching their specific criteria, reducing unnecessary processing.

  • FIFO Ordering: Supports First-In-First-Out (FIFO) ordering when strict message sequencing and deduplication are required between applications.

  • Data Security: Provides server-side encryption (SSE) using AWS KMS keys to protect sensitive message payloads at rest.

Common Use Cases

  • Application Alerts: Sending instant SMS or email notifications to system administrators when an application error occurs.

  • Workflow Automation: Triggering multiple backend Lambda functions simultaneously when a user uploads a profile picture (e.g., resizing, facial recognition, and logging).

  • Data Replication: Broadcasting transactional updates from a primary database to multiple downstream read caches or analytics systems.

In the ML Provisioner

  • Tier: Enterprise

  • Count: 1 per stack

  • Role: Acts as the security alerts hub for the ML stack. Receives notifications from the two AWS::CloudWatch::Alarm resources when security violations are detected. Fans out to the AWS::SNS::Subscription for delivery to the configured endpoint (e.g., the security team’s email or ticketing system).

AWS::SNS::Subscription

The AWS::SNS::Subscription resource in AWS CloudFormation is used to link an Amazon Simple Notification Service (SNS) topic to a specific endpoint so that messages published to the topic are delivered to that endpoint.

Primary Purposes

  • Endpoint Registration: Connects an SNS topic to a designated destination to automate message delivery.

  • Protocol Configuration: Specifies the communication method (e.g., Lambda, SQS, Email, HTTP) used to transmit the message.

  • Content Filtering: Implements client-side filter policies to ensure the subscriber only receives a specific subset of messages.

  • Delivery Optimization: Configures dead-letter queues (DLQ) and retry policies to handle message delivery failures safely.

  • Scope Automation: Allows multi-account or multi-region subscription setups through infrastructure-as-code automation.

Supported Protocols and Targets

  • sqs: Routes messages directly to an Amazon SQS queue for asynchronous processing.

  • lambda: Invokes an AWS Lambda function automatically whenever a message is published.

  • http / https: Delivers a POST request payload to an external web webhook or application server.

  • email / email-json: Sends text or JSON-formatted notifications directly to a user’s inbox.

  • sms: Transmits short text messages straight to mobile phone numbers.

  • firehose: Streams notifications directly into an Amazon Data Firehose delivery stream for archiving.

Key Configuration Features

  • FilterPolicy: A JSON block that evaluates message attributes to discard unwanted messages before delivery.

  • RedrivePolicy: Targets an SQS queue to capture messages that fail all delivery retry attempts.

  • DeliveryPolicy: Sets up custom retry strategies (exponential backoff) for HTTP/S endpoints.

  • RawMessageDelivery: Strips the SNS metadata wrapper, sending only the exact payload string to SQS or HTTP endpoints

In the ML Provisioner

  • Tier: Enterprise

  • Count: 1 per stack

  • Role: Connects the AWS::SNS::Topic to the security team’s notification endpoint. Delivers security violation alerts when a CloudWatch Alarm fires. The protocol and endpoint (e.g., email address, SQS queue, HTTP webhook) are configured via the client YAML config.

AWS::EC2::VPCEndpoint

An AWS::EC2::VPCEndpoint resource creates a private connection between your Virtual Private Cloud (VPC) and supported AWS services or SaaS applications. It eliminates the need to use an internet gateway, NAT device, VPN connection, or AWS Direct Connect.

Core Purposes

  • Enables Private Connectivity: Traffic between your VPC and the target service does not leave the Amazon network.

  • Enhances Network Security: Instances in private subnets can access AWS services without public IP addresses.

  • Reduces Internet Exposure: Eliminates risks associated with routing sensitive data over the public internet.

  • Lowers Data Transfer Costs: Replaces costly NAT gateway data processing fees for supported AWS services.

  • Enforces Access Control: Attaches endpoint policies to restrict which identities or actions can use the endpoint.

Endpoint Types Created

  • Gateway Endpoints: Provisions a routing target in your VPC route table for Amazon S3 and DynamoDB (free of charge).

  • Interface Endpoints: Provisions an Elastic Network Interface (ENI) with a private IP address from your subnet for most AWS services.

  • Gateway Load Balancer Endpoints: Intercepts traffic and routes it to security appliances for inspection

In the ML Provisioner

  • Tier: Enterprise

  • Count: 4 per stack

  • Role: Enables SageMaker VPC-only mode by providing private connectivity to the four AWS services required by SageMaker without routing traffic over the public internet:

    • sagemaker.api — Interface endpoint for SageMaker control plane API calls

    • sagemaker.runtime — Interface endpoint for SageMaker inference endpoint invocations

    • s3 — Gateway endpoint for S3 access (free of charge)

    • sts — Interface endpoint for IAM role assumption within the VPC

  • VPC source: Read from SSM Parameter Store at the path configured in vpc_integration.vpc_parameter_store_path

  • SSM output: /ml/{product-name}/VpcEndpointIds

AWS::EC2::SecurityGroup

AWS::EC2::SecurityGroupis a CloudFormation resource used to create and configure virtual firewalls for Amazon EC2 instances and other AWS services to control inbound and outbound network traffic.

Core Purposes

  • Traffic Filtering: Acts as an instance-level firewall controlling both inbound (ingress) and outbound (egress) network traffic.

  • Stateful Inspection: Tracks connection states automatically. If an inbound request is allowed, the outbound response is automatically permitted regardless of outbound rules.

  • Default Deny: Blocks all incoming traffic by default until you explicitly add rules to permit it.

  • Implicit Egress: Allows all outbound traffic by default upon creation unless custom egress rules are defined.

Common Use Cases

  • Multi-Tier Isolation: Segregates web servers, application servers, and databases by allowing communication only between specific tiers.

  • Resource Referencing: Allows rules to reference other security groups instead of hardcoded IP addresses, enabling dynamic scaling.

  • Access Control: Restricts administrative access (like SSH or RDP) to specific corporate IP ranges.

  • Microservice Security: Isolates individual containers or Lambda functions running inside a Virtual Private Cloud (VPC).

In the ML Provisioner

  • Tier: Enterprise

  • Count: 1 per stack — conditional on vpc_integration.mode: standalone

  • Role: Controls inbound and outbound traffic to the four VPC endpoints provisioned in the same stack. Created only in standalone mode (when the client does not have the SG Provisioner). In SG Provisioner mode, the existing security group managed by the SG Provisioner is reused instead — its ID is read from SSM at the path configured in vpc_integration.sg_parameter_store_path.

  • Security validator: Blocks generation if SSH (port 22) or RDP (port 3389) are open to 0.0.0.0/0.

  • SSM output: /ml/{product-name}/SecurityGroupId

Key Technical Attributes

  • VPC Bound: Tied directly to a specific VPC and cannot control traffic for resources outside that VPC.

  • Permissive Rules: Supports “allow” rules only; you cannot create “deny” rules.

  • Immediate Effect: Modifications to rules apply to associated resources immediately without rebooting.