Policy Guide¶
Table of Contents¶
Overview¶
This guide helps you understand and choose the right IAM policy access levels for your MLOps team. Each policy type (S3, ECR, Pipeline, Inference) offers multiple access levels designed around real-world use cases and security best practices.
Key Principles:
Least Privilege - Start with minimal access, expand only when needed
Separation of Duties - Different access for humans vs automation
Environment Isolation - Production access requires explicit permission
Audit Trail - All actions are CloudTrail-logged for compliance
Quick Reference¶
Policy Type |
Access Levels |
Typical Users |
|---|---|---|
S3 |
read-only, project-buckets-only, project-buckets-full, full |
Data Scientists, ML Engineers, Admins |
ECR |
read-only, dev-read-write, ci-read-write, full |
Developers, CI/CD Pipelines, DevOps |
Pipeline |
read-only, project-dev, project-ci, full |
ML Engineers, MLOps Admins, Auditors, CI/CD |
Inference |
read-only, read-only-invoke, dev-invoke, prod-invoke, full, deploy-only |
Data Scientists, Backend Developers, Business Consumers, MLOps, CI/CD |
S3 Policies¶
Your S3 bucket structure follows MLOps best practices with 130+ organized folders for datasets, models, artifacts, and logs.
Level 1: read-only¶
Purpose: This IAM policy grants a user basic read-only and discovery access to his/her S3 environment, but it restricts object-level interaction to specific buckets matching a naming pattern.
Typical Users:
Junior data scientists
Business analysts
Auditors and compliance reviewers
External consultants (read-only access)
What You Can Do:
â Discover all buckets
â View bucket metadata
â List objects in specific buckets
â Read files and history
â See version history
"Sid": "AllowListAllBuckets"
Policy Action |
Description |
|---|---|
Discover all buckets |
See a list of every S3 bucket in your AWS account via the console or CLI ( |
View bucket metadata |
Retrieve the AWS region where any bucket is located ( |
"Sid": "AllowReadAndVersionAccess"
Policy Action |
Description |
|---|---|
List objects in specific buckets |
See the files and folders inside buckets that match the pattern |
Read files and history |
Download or view the content of objects and their historical versions (if versioning is enabled) within those specific matching buckets ( |
See version history |
List the different versions of files within the allowed buckets ( |
What You Cannot Do:
â No modifications
â No permission changes
â No access to other bucketsâ content
â No administrative tasks
Policy Action |
Description |
|---|---|
No modifications |
Perform any âwriteâ actions, such as uploading files ( |
No permission changes |
Modify bucket policies or Access Control Lists (ACLs) to change who else can access the data. |
No access to other bucketsâ content |
While they can see the names of all buckets in the account, they cannot see the files inside or download anything from any bucket that doesnât match the |
No administrative tasks |
Cannot empty buckets, change lifecycle rules, or modify bucket settings like encryption or logging. |
Example Scenario:
Sarah is a new data scientist who needs to explore existing datasets and model artifacts to understand the current ML pipeline. She doesnât need to upload anything yet, just learn the landscape.
Sample Permissions:
[
{
"Sid": "AllowListAllBuckets",
"Effect": "Allow",
"Action": [
"s3:ListAllMyBuckets",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::*"
},
{
"Sid": "AllowReadAndVersionAccess",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion",
"s3:ListBucket",
"s3:ListBucketVersions"
],
"Resource": [
"arn:aws:s3:::edge-prod-b001-*",
"arn:aws:s3:::edge-prod-b001-*/*"
]
}
]
Note: ListAllMyBuckets cannot be scoped to specific buckets (AWS limitation). Users will see bucket names across the account but can only read objects from their own tenantâs buckets.
Level 2: project-buckets-only¶
Purpose: This IAM policy implements policy that allows standard data science and ML engineering workflows while strictly preventing deletions and bucket-level changes. This policy uses an explicit allow for specific operations and relies on the absence of delete permissions to enforce your âCannot Doâ rules.
Typical Users:
Data scientists (standard access)
ML engineers (development work)
Automated training jobs
Experimentation workflows
What You Can Do:
â Read-Only Everything
â Upload & Overwrite
â Create Folders
â Modify Object Tags
"Sid": "ListBucketsAndLocation"
Applied to all S3 resources (*), these allow the user to see the âbig pictureâ in the AWS Console or via CLI.
Policy Action |
Description |
|---|---|
s3:ListAllMyBuckets |
Allows the user to list the names of all buckets owned by the AWS account. |
s3:GetBucketLocation |
Allows the user to see which AWS Region (e.g., us-east-1) a specific bucket resides in. |
|
These actions apply to the bucket itself, rather than the files inside it.
Policy Action |
Description |
|---|---|
s3:ListBucket |
Allows the user to list the objects (files and folders) inside the bucket. |
s3:GetBucketVersioning |
Allows the user to check if the bucket has Versioning enabled (which keeps a history of object changes). |
"Sid": "ObjectLevelReadWriteAndTagging"
These actions allow the user to manage the actual data and metadata within the buckets.
Policy Action |
Description |
|---|---|
s3:GetObject |
Allows the user to download or read a file. |
s3:GetObjectVersion |
Allows the user to retrieve a specific historical version of a file (if versioning is on). |
s3:PutObject |
Allows the user to upload new files or update existing ones. |
s3:PutObjectTagging |
Allows the user to add or change âtagsâ (key-value pairs used for organization or billing) on an object. |
s3:GetObjectTagging |
Allows the user to view the tags currently assigned to an object. |
What You Cannot Do:
Policy Action |
Description |
|---|---|
Delete anything |
There are no s3:DeleteObject or s3:DeleteBucket permissions in the policy. |
Manage Permissions |
The user cannot change or view Access Control Lists (ACLs) or bucket policies (no |
Access other Buckets |
The user can only list/read/write to buckets starting with the prefix |
Modify Bucket Settings |
Aside from viewing versioning, the user cannot change bucket configurations like Encryption, Logging, or Lifecycle rules. |
Perform Administrative Tasks |
The cannot create new buckets or delete existing ones. |
Object Permanent Deletion |
Even though a user can âPutâ objects, he/she cannot remove them or manage object versions beyond reading them. |
Manage Lifecycle or Encryption |
Can not set up data archiving (Glacier), expiration rules, or modify server-side encryption settings. |
CORS or Website Config |
A user lacks permissions to configure the buckets for static website hosting or cross-origin resource sharing. |
Example Scenario:
Marcus is training models and needs to upload preprocessed datasets to
raw-data/project-x/and save model artifacts tomodels/project-x/. He can overwrite files during iterative development but cannot accidentally delete the teamâs shared datasets.
Sample Permissions:
[
{
"Sid": "ListBucketsAndLocation",
"Effect": "Allow",
"Action": [
"s3:ListAllMyBuckets",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::*"
},
{
"Sid": "BucketLevelReadAndList",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketVersioning"
],
"Resource": "arn:aws:s3:::edge-prod-b001-*"
},
{
"Sid": "ObjectLevelReadWriteAndTagging",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion",
"s3:PutObject",
"s3:PutObjectTagging",
"s3:GetObjectTagging"
],
"Resource": [
"arn:aws:s3:::edge-prod-b001-*",
"arn:aws:s3:::edge-prod-b001-*/*"
]
}
]
Level 3: project-buckets-full¶
Purpose: The following IAM policy provides âFull Object Accessâ for your project buckets. It allows senior data scientists and leads to manage all data (including deleting objects and versions) while strictly preventing any changes to the bucketâs structure or configuration.
Typical Users:
Senior data scientists
ML team leads
Project managers (data cleanup)
Cost optimization roles
What You Can Do:
â Global Actions
â Bucket-Level Access
â Object-Level Management
"Sid": "AllowListAllBuckets"
Provides global visibility to see that S3 buckets exist and where they are located.
Policy Action |
Description |
|---|---|
s3:ListAllMyBuckets |
Allows the user to list all buckets in the AWS account (required for viewing buckets in the AWS Console). |
s3:GetBucketLocation |
Allows the user to see the specific AWS Region where a bucket is hosted. |
"Sid": "BucketLevelReadAndList"
Allows the user to see what is inside specific buckets (those starting with edge-prod-b001-).
Policy Action |
Description |
|---|---|
s3:ListBucket |
Allows the user to list the objects (files) within a bucket. |
s3:ListBucketVersions |
Allows the user to list all versions of every object in the bucket. |
s3:GetBucketVersioning |
Allows the user to check if the bucket has versioning enabled or suspended. |
"Sid": "ObjectLevelFullManagement"
Grants full control over the lifecycle and metadata of files within the edge-prod-b001- buckets.
Policy Action |
Description |
|---|---|
s3:GetObject |
Allows reading/downloading a file. |
s3:GetObjectVersion |
Allows downloading a specific historical version of a file. |
s3:PutObject |
Allows uploading new files or updating existing ones. |
s3:DeleteObject |
Allows removing the current version of a file. |
s3:DeleteObjectVersion |
Allows permanently deleting a specific historical version of a file. |
s3:PutObjectTagging |
Allows adding or updating key-value tags on a file (often used for cost tracking or access control). |
s3:GetObjectTagging |
Allows viewing the tags associated with a file. |
s3:AbortMultipartUpload |
Allows canceling a large file upload that is currently in progress, which cleans up temporary storage parts. |
What You Cannot Do:
â Read from or write to other buckets
â Administrative changes
â Bucket Creation or Deletion
â Permanent Deletions (MFA)
â Permissions Management
â Account-wide S3 Features
Policy Action |
Description |
|---|---|
Read from or write to other buckets |
While a user can see the names of all buckets in the account, he/she cannot list the contents or download files from any bucket that doesnât start with |
Administrative changes |
A user cannot delete buckets, change bucket policies, or modify encryption settings (policy does not include actions like |
Bucket Creation or Deletion |
There are no permissions in the policy to create a brand-new bucket or delete an existing one |
Permanent Deletions (MFA) |
If MFA delete is enabled on a bucket, a user wouldnât be able to permanently purge versions without an MFA token. |
Permissions Management |
A user cannot grant other people access to these files because |
Account-wide S3 Features |
This policy does not allow the user to manage account-level features like S3 Access Points, S3 Object Lambda, or S3 Batch Operations. |
Example Scenario:
Elena is a senior ML engineer managing a project that generated 500GB of failed experiment artifacts over 6 months. She needs to delete these to reduce S3 costs while keeping successful model artifacts intact.
Why This Level Exists:
The S3 provisioner creates buckets with versioning disabled by default to save costs. However, customers can enable versioning. This level includes DeleteObjectVersion so senior team members can clean up versions if needed, without requiring full admin access.
Sample Permissions:
[
{
"Sid": "AllowListAllBuckets",
"Effect": "Allow",
"Action": [
"s3:ListAllMyBuckets",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::*"
},
{
"Sid": "BucketLevelReadAndList",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:ListBucketVersions",
"s3:GetBucketVersioning"
],
"Resource": "arn:aws:s3:::edge-prod-b001-*"
},
{
"Sid": "ObjectLevelFullManagement",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion",
"s3:PutObject",
"s3:DeleteObject",
"s3:DeleteObjectVersion",
"s3:PutObjectTagging",
"s3:GetObjectTagging",
"s3:AbortMultipartUpload"
],
"Resource": [
"arn:aws:s3:::edge-prod-b001-*",
"arn:aws:s3:::edge-prod-b001-*/*"
]
}
]
Level 4: full¶
Purpose: This policy allows the identified administrators and engineers to perform all S3 operations, including high-level bucket management like modifying policies, versioning, lifecycle rules, replication, and bucket deletion.
Typical Users:
MLOps platform administrators
DevOps engineers
Infrastructure team
Break-glass emergency access
What You Can Do:
This is a wildcard permission that covers all 100+ S3 operations.
â Manage Buckets: Create new buckets, delete existing ones, and change bucket regions.
â Manage Objects: Upload, download, copy, and permanently delete files (objects).
â Control Security: Modify Bucket Policies, Access Control Lists (ACLs), and Public Access Block settings, potentially making data public.
â Configure Features: Set up lifecycle rules (like auto-archiving to Glacier), enable versioning, configure replication, and manage encryption settings.
â Account-Level Tasks: View storage inventory, analytics, and metrics for the entire S3 service.
What You Cannot Do:
â Nothing - this is full S3 access within your environment
Example Scenario:
James is the MLOps platform owner who needs to configure S3 lifecycle policies to automatically archive old model artifacts to Glacier after 90 days, reducing storage costs by 70%.
Security Note: â ïž This level should be assigned sparingly. Most users need project-buckets-only or project-buckets-full.
Sample Permissions:
[
{
"Sid": "S3FullAccessPermissions",
"Effect": "Allow",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::*",
"arn:aws:s3:::*/*"
]
}
]
ECR Policies¶
ECR (Elastic Container Registry) stores your Docker images for ML training, inference, and pipeline components.
Enterprise Compliance Model¶
ECR policies follow a 4-level model that separates human access from automation access â critical for regulated industries (FinTech, Healthcare, Government) where compliance requires separation of duties.
Level Overview:
Level |
Name |
Who |
Purpose |
|---|---|---|---|
1 |
read-only |
Runtime environments |
Pull images â pure consumers |
2 |
dev-read-write |
Data scientists (humans) |
Push/pull, create repos, trigger scans interactively |
3 |
ci-read-write |
CI/CD pipelines (automation) |
Push/pull, create repos, validate lifecycle rules |
4 |
full |
MLOps administrators |
Complete registry management including deletion |
Key Distinction â Level 2 vs Level 3:
dev-read-write â Assigned to IAM users (humans)
ci-read-write â Assigned to IAM roles (CI/CD automation)
While both levels share a core set of push/pull/discovery actions, they are shaped by who uses them:
Level 2 includes
ecr:StartImageScanandecr:DescribeImageScanFindingsâ humans trigger and review scans interactivelyLevel 3 includes
ecr:GetLifecyclePolicyPreviewâ CI pipelines validate lifecycle rules as part of infrastructure automationLevel 3 omits scan actions because CI relies on ECRâs scan-on-push setting
This separation ensures audit trails clearly show human vs automated actions, satisfying SOC2, HIPAA, and PCI-DSS requirements.
Level 1: read-only¶
Purpose: Pull images for local development and testing
Typical Users:
Data scientists (local testing)
QA engineers
Security scanners
Developers onboarding to the platform
What You Can Do:
â Authenticate
â Pull Images
â View Metadata
â Check Security
"Sid": "AllowECRAuth"
Grants the basic permission required to authenticate with Amazon ECR. This is the âhandshakeâ step needed before any other ECR actions can be performed.
Policy Action |
Description |
|---|---|
ecr:GetAuthorizationToken |
Allows the user to retrieve an encrypted authorization token. This token is used with the Docker CLI (via aws ecr get-login-password) to authenticate your local environment to the registry. |
"Sid": "AllowReadOnlyPullAndMetadata"
Provides âRead-Onlyâ access to ECR. It allows users to view repository details and pull (download) images, but does not allow them to upload (push), delete, or modify anything.
Policy Action |
Description |
|---|---|
ecr:BatchCheckLayerAvailability |
Allows the user to check if the specific âlayersâ that make up a Docker image already exist in the repository. |
ecr:GetDownloadUrlForLayer |
Provides a URL to download a specific image layer; this is a background action required for the docker pull command to function. |
ecr:BatchGetImage |
Allows the user to retrieve the detailed information (manifests) for a specific set of images to facilitate downloading them. |
ecr:DescribeRepositories |
Allows the user to see a list of repositories within the registry and view their settings. |
ecr:ListImages |
Allows the user to view a list of all image tags and digests within a specific repository. |
ecr:DescribeImages |
Provides detailed metadata about specific images, such as the size, push date, and associated tags. |
ecr:DescribeImageScanFindings |
Allows the user to view the results of vulnerability scans performed on the images. |
What You Cannot Do:
â Upload Images
â Delete Content
â Modify Settings
Policy Action |
Description |
|---|---|
Upload Images |
A user can not push new images or layers (actions not present in the policy: |
Delete Content |
A user can not delete images, tags or repositories (actions |
Modify Settings |
A user can not create repositories, change permissions, or update lifecycle policies (actions |
Example Scenario:
Priya is a data scientist who needs to pull the teamâs base ML training image (
acme-mlops-dev/ml-training:v2.1) to run experiments locally on her laptop.
Sample Permissions:
[
{
"Sid": "AllowECRAuth",
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken"
],
"Resource": "*"
},
{
"Sid": "AllowReadOnlyPullAndMetadata",
"Effect": "Allow",
"Action": [
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:DescribeRepositories",
"ecr:ListImages",
"ecr:DescribeImages",
"ecr:DescribeImageScanFindings"
],
"Resource": "arn:aws:ecr:*:*:repository/${company_prefix}-${env}/*"
}
]
Docker Command:
This command sequence authenticates your local Docker client with a private Amazon Elastic Container Registry (ECR) and then downloads a specific container image to your machine. Together, these commands ensure you have the necessary permissions to access a private AWS repository and download a machine learning training image (ml-training:v2.1) used in your MLOps development environment. The authorization token provided by AWS is valid for 12 hours, after which you must run the login command again.
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 123456789012.dkr.ecr.us-east-1.amazonaws.com
docker pull 123456789012.dkr.ecr.us-east-1.amazonaws.com/acme-mlops-dev/ml-training:v2.1
Explanation:
Part 1: Authentication
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 123456789012.dkr.ecr.us-east-1.amazonaws.com
aws ecr get-login-password: This AWS CLI command retrieves a temporary base64-encoded authorization token.
âregion us-east-1: Specifies the AWS Region where your registry is hosted.
| (The Pipe): This takes the password generated by the first command and passes it directly as input to the next command.
docker login: Initializes the login process for a Docker registry.
âusername AWS: For Amazon ECR, the username is always AWS.
âpassword-stdin: Tells Docker to read the password from the âstandard inputâ (the pipe), which is more secure than typing it out.
123456789012.dkr.ecr.us-east-1.amazonaws.com: This is the unique URI for your private registry. It follows the format
<account_id>.dkr.ecr.<region>.amazonaws.com.
Part 2: Pulling the Image
docker pull 123456789012.dkr.ecr.us-east-1.amazonaws.com/acme-mlops-dev/ml-training:v2.1
docker pull: The standard command to download an image from a registry.
acme-mlops-dev/ml-training: The name of the specific repository within your ECR registry.
:v2.1: The specific version tag of the image you want to download.
Level 2: dev-read-write¶
Purpose: This JSON policy grants the necessary permissions for developers to build, tag, and push images while maintaining read-only visibility and the ability to create new repositories. It explicitly excludes destructive or administrative actions like deleting repositories, modifying policies, or changing lifecycles.
Typical Users:
ML engineers (manual image builds)
DevOps engineers (troubleshooting)
Platform developers (base image maintenance)
Assignment: IAM Users only (not roles)
What You Can Do:
â Everything in read-only, PLUS:
â Push new Docker images
â Tag images
â Create new repositories
â Initiate image scans
"Sid": "ReadOnlyAndDiscovery"
Policy Action |
Description |
|---|---|
ecr:GetAuthorizationToken |
Obtain a temporary password to authenticate a Docker CLI to the registry. |
ecr:DescribeRepositories |
View metadata about existing repositories (e.g., URI, creation date). |
ecr:DescribeImages |
View metadata about images within a repository (e.g., push date, size, tags). |
ecr:ListImages |
Get a list of all image IDs in a repository. |
ecr:BatchGetImage |
Pull/download image manifest information for one or more images. |
ecr:GetRepositoryPolicy |
View the JSON resource-level policy attached to a repository. |
ecr:GetLifecyclePolicy |
View the rules that automatically clean up old images. |
ecr:ListTagsForResource |
View the tags (key-value pairs) assigned to an ECR resource. |
ecr:DescribeImageScanFindings |
View the security vulnerability reports for scanned images. |
"Sid": "PushAndTagImages"
Policy Action |
Description |
|---|---|
ecr:BatchCheckLayerAvailability |
Check if specific image layers already exist in the registry (used during push). |
ecr:GetDownloadUrlForLayer |
Retrieve a URL to download specific image layers. |
ecr:InitiateLayerUpload |
The first step of the process required to upload new image layers to a repository. |
ecr:UploadLayerPart |
The second step of the process required to upload new image layers to a repository. |
ecr:CompleteLayerUpload |
The third step of the process required to upload new image layers to a repository. |
ecr:PutImage |
Finalize the upload by adding the image manifest to the repository. |
ecr:TagResource |
Add or update tags (metadata) on ECR resources like repositories. |
"Sid": "ManageRepositoriesAndScans"
Policy Action |
Description |
|---|---|
ecr:CreateRepository |
Create entirely new, empty repositories. |
ecr:StartImageScan |
Manually trigger a vulnerability scan on an existing image. |
What You Cannot Do:
â Delete images, repositories or life cycle policies
â Modify repository policies
â Change lifecycle policies
Policy Action |
Description |
|---|---|
Delete Resources |
The policy lacks |
Modify Policies |
There are no |
Lifecycle Management |
Users can view policies but cannot create or change them ( |
Public ECR |
This policy applies to Private ECR. It does not grant permissions for ecr-public actions. |
Example Scenario:
Tom is an ML engineer who built a new training image with updated dependencies. He needs to push it to ECR so the team can test it before integrating into the CI/CD pipeline.
Sample Permissions:
[
{
"Sid": "AllowECRAuth",
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken"
],
"Resource": "*"
},
{
"Sid": "ReadOnlyAndDiscovery",
"Effect": "Allow",
"Action": [
"ecr:DescribeRepositories",
"ecr:DescribeImages",
"ecr:ListImages",
"ecr:BatchGetImage",
"ecr:GetRepositoryPolicy",
"ecr:GetLifecyclePolicy",
"ecr:ListTagsForResource",
"ecr:DescribeImageScanFindings"
],
"Resource": "arn:aws:ecr:*:*:repository/${company_prefix}-${env}/*"
},
{
"Sid": "PushAndTagImages",
"Effect": "Allow",
"Action": [
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:InitiateLayerUpload",
"ecr:UploadLayerPart",
"ecr:CompleteLayerUpload",
"ecr:PutImage",
"ecr:TagResource"
],
"Resource": "arn:aws:ecr:*:*:repository/${company_prefix}-${env}/*"
},
{
"Sid": "ManageRepositoriesAndScans",
"Effect": "Allow",
"Action": [
"ecr:CreateRepository",
"ecr:StartImageScan"
],
"Resource": "arn:aws:ecr:*:*:repository/${company_prefix}-${env}/*"
}
]
Docker Commands:
# Build and push
docker build -t ml-training:v2.2 .
docker tag ml-training:v2.2 123456789012.dkr.ecr.us-east-1.amazonaws.com/acme-mlops-dev/ml-training:v2.2
docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/acme-mlops-dev/ml-training:v2.2
Why No Delete Permission: Image cleanup should be handled by ECR Lifecycle Policies (automated, safe) rather than manual deletion (error-prone, risky).
Level 3: ci-read-write¶
Purpose: CI/CD pipelines building and pushing images automatically
Typical Users:
GitHub Actions workflows
Jenkins build jobs
CodePipeline stages
GitLab CI runners
Assignment: IAM Roles only (not users)
What You Can Do:
â Push images from automated builds
â Tag images with build metadata
â Create repositories on-demand
â Read and preview lifecycle policies (for infra-as-code validation)
How It Differs from dev-read-write:
â Adds
ecr:GetLifecyclePolicyPreviewâ CI pipelines validate lifecycle rules as part of infrastructure automationâ Removes
ecr:StartImageScanandecr:DescribeImageScanFindingsâ CI relies on ECRâs scan-on-push setting rather than triggering scans directly
What You Cannot Do:
â No Deletion: Actions like ecr:BatchDeleteImage or ecr:DeleteRepository are not included, preventing runners from removing version history or entire projects.
â No Security Modification: ecr:SetRepositoryPolicy and ecr:DeleteRepositoryPolicy are excluded, ensuring the pipeline cannot change who has access to the images.
â No Lifecycle Changes: While the runner can read lifecycle policies, it cannot modify or delete them (ecr:PutLifecyclePolicy), ensuring automated cleanup rules remain intact.
Example Scenario:
A GitHub Actions workflow automatically builds a new training image on every merge to main, tags it with the git commit SHA, and pushes it to ECR for deployment.
Sample Permissions:
[
{
"Sid": "AllowECRAuth",
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken"
],
"Resource": "*"
},
{
"Sid": "AllowRepositoryCreation",
"Effect": "Allow",
"Action": [
"ecr:CreateRepository"
],
"Resource": "arn:aws:ecr:*:*:repository/${company_prefix}-${env}/*"
},
{
"Sid": "ContinuousIntegrationReadWrite",
"Effect": "Allow",
"Action": [
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:GetRepositoryPolicy",
"ecr:DescribeRepositories",
"ecr:ListImages",
"ecr:DescribeImages",
"ecr:BatchGetImage",
"ecr:GetLifecyclePolicy",
"ecr:GetLifecyclePolicyPreview",
"ecr:ListTagsForResource",
"ecr:InitiateLayerUpload",
"ecr:UploadLayerPart",
"ecr:CompleteLayerUpload",
"ecr:PutImage",
"ecr:TagResource"
],
"Resource": "arn:aws:ecr:*:*:repository/${company_prefix}-${env}/*"
}
]
"Sid": "AllowRepositoryCreation"
Policy Action |
Description |
|---|---|
ecr:CreateRepository |
Allows the user to create a new, empty repository to store Docker or OCI-compliant images. |
"Sid": "ContinuousIntegrationReadWrite"
Authentication
Policy Action |
Description |
|---|---|
ecr:GetAuthorizationToken |
Allows the user to request a short-lived password (token) to authenticate a Docker CLI client against ECR. |
Read/Pull Actions
Policy Action |
Description |
|---|---|
ecr:BatchCheckLayerAvailability |
Checks if specific image layers already exist in the repository. |
ecr:GetDownloadUrlForLayer |
Retrieves a URL to download a specific image layer. |
ecr:GetRepositoryPolicy |
Allows the user to view the resource-based permissions policy of a repository. |
ecr:DescribeRepositories |
Returns metadata about repositories (e.g., creation date, URI, and settings). |
ecr:ListImages |
Lists basic information about the images stored in a repository. |
ecr:DescribeImages |
Provides detailed metadata about images, such as size, push date, and tags. |
ecr:BatchGetImage |
Allows the user to retrieve the image manifest or configuration for one or more images (required for pulling). |
ecr:GetLifecyclePolicy |
Retrieves the current lifecycle rules (which automate image deletion). |
ecr:GetLifecyclePolicyPreview |
Allows the user to see the results of a lifecycle policy before it is applied. |
ecr:ListTagsForResource |
Displays the tags (metadata) associated with a specific ECR repository. |
Write/Push Actions
Policy Action |
Description |
|---|---|
ecr:InitiateLayerUpload |
Starts the multi-step process of uploading an image layer. |
ecr:UploadLayerPart |
Allows the user to upload a specific segment of an image layer. |
ecr:CompleteLayerUpload |
Informs ECR that all parts of a layer have been uploaded and can be finalized. |
ecr:PutImage |
Finalizes the push process by uploading the image manifest, making the image available in the repository. |
ecr:TagResource |
Allows the user to add or update metadata tags on the repository itself. |
GitHub Actions Example:
- name: Login to ECR
run:
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $ECR_REGISTRY
- name: Build and push
run: |
docker build -t $ECR_REGISTRY/acme-mlops-dev/ml-training:$GITHUB_SHA .
docker push $ECR_REGISTRY/acme-mlops-dev/ml-training:$GITHUB_SHA
Compliance Note: In regulated industries, auditors can trace:
Human actions â IAM user CloudTrail logs (dev-read-write)
Automated actions â IAM role CloudTrail logs (ci-read-write)
This separation satisfies SOC2, HIPAA, and PCI-DSS requirements.
Level 4: full¶
This policy uses the wildcard ecr:* to grant all available permissions within the Amazon ECR service.
Purpose: Administrative access for repository management
Typical Users:
MLOps platform administrators
DevOps team leads
Security engineers (policy management)
What You Can Do:
Functionality |
Policy Action |
Description |
|---|---|---|
â Delete Images and Repositories |
|
Allows the deletion of multiple specified images within a repository. |
|
Grants the ability to permanently remove an entire repository. |
|
â Modify Repository Policies |
|
Allows you to apply or change resource-based policies to control who can access specific repositories. |
|
Enables the removal of existing repository access policies. |
|
â Configure Lifecycle Policies |
|
Allows creating or updating rules that automatically expire or delete old images based on age or count. |
|
Permits viewing the current automated cleanup rules. |
|
â Set Up Cross-Region Replication |
|
Grants permission to configure settings that automatically copy images to other AWS regions or accounts. |
â Manage Image Scanning Settings |
|
Allows you to enable or disable automatic vulnerability scanning upon image push. |
|
Permits manually triggering a security scan for a specific image. |
|
â Repository and Image Management |
|
Allows the creation of new private repositories to store container images. |
|
Provides the ability to list and view metadata for all repositories and images. |
|
|
Allows pushing new container images or updating existing ones. |
What You Cannot Do:
â No Restrictions - This policy is designed for full administrative access; there are no denied actions within the ECR service scope.
Example Scenario:
Lisa is the platform administrator who needs to configure an ECR Lifecycle Policy to automatically delete untagged images after 7 days and keep only the last 10 tagged images per repository, reducing storage costs.
Security Note: â ïž Most users need read-only or dev-read-write. Reserve full access for platform administrators. While this policy allows all ECR actions, users still require ecr:GetAuthorizationToken (included in ecr:*) to authenticate their Docker CLI with the registry.
Sample Permissions:
[
{
"Sid": "FullECRAdminAccess",
"Effect": "Allow",
"Action": [
"ecr:*"
],
"Resource": "*"
}
]
Pipeline Policies¶
SageMaker Pipeline policies control access to ML training and deployment workflows.
Enterprise Compliance Model¶
Pipeline policies follow a 4-level model that separates human access from automation access - mirroring the ECR pattern for consistency and compliance.
Key Distinction:
project-dev â Assigned to IAM users (humans)
project-ci â Assigned to IAM roles (CI/CD automation)
Both have identical permissions, but the assignment pattern ensures audit trails clearly show human vs automated actions.
Level 1: read-only¶
Purpose: This policy is designed for read-only governance and monitoring across CI/CD and Machine Learning workflows. It allows a user to audit the status, history, and logs of automated pipelines without the ability to create, modify, or delete resources.
The primary goal is to provide full visibility into the state of AWS CodePipeline and SageMaker Model Building Pipelines. It is ideal for auditors, project managers, or automated monitoring tools that need to track deployment progress and execution history across an entire AWS account.
Typical Users:
Auditors and compliance reviewers
Model risk managers
Executive stakeholders
New team members learning the platform
Cross-team visibility roles
What You Can Do (CI/CD pipelines):
â View the full architecture and configuration of any pipeline.
â Access the complete execution history for auditing purposes.
â Monitor the live progress of a running pipeline and review its logs.
â Examine pipeline steps and configurations (e.g., environment variables, source branches).
â Access and export lists of pipelines and executions for compliance reporting.
What You Cannot Do (Restrictive actions for CI/CD pipelines):
â Create or Modify: You cannot change the pipelineâs structure, add new stages, or delete existing ones.
â Start Executions: You are barred from manually triggering a new pipeline run.
â Stop/Cancel: You cannot intervene in an active process to stop or roll it back.
â Delete: You do not have the permission to remove pipeline resources or execution history.
What You Can Do (Sagemaker pipelines):
â List all available pipelines in the account to provide a high-level overview for audit purposes.
â View the history of all pipeline runs, allowing auditors to see when and how many times a workflow was triggered.
â Examine the individual steps within a specific execution to verify that each stage (e.g., training, processing) completed as expected.
â Retrieve the metadata and configuration of a pipeline definition to review its architectural design.
â View the current status (e.g., Succeeded, Failed) and specific details of a single execution run.
â Access the exact version of the pipeline definition used for a specific historical run, ensuring the âas-runâ configuration is verifiable.
What You Cannot Do (Restrictive actions for Sagemaker pipelines):
â sagemaker:CreatePipeline: Prevent the creation of new workflows that could bypass established compliance checks.
â sagemaker:UpdatePipeline: Ensure that existing validated pipeline definitions remain immutable and cannot be altered.
â sagemaker:StartPipelineExecution: Disable the ability to trigger new runs, preventing unauthorized compute costs or production changes.
â sagemaker:StopPipelineExecution: Prevent users from interfering with active, ongoing production workloads.
â sagemaker:DeletePipeline: Protect historical audit trails and definitions from being permanently removed.
Example Scenario:
Rachel is a model risk manager who needs to review all ML training pipelines quarterly to ensure they meet compliance requirements for bias detection and data validation. She needs to see pipeline configurations and execution logs but should not be able to trigger or modify any workflows.
Sample Permissions:
[
{
"Sid": "CodePipelineReadOnly",
"Effect": "Allow",
"Action": [
"codepipeline:GetPipeline",
"codepipeline:GetPipelineExecution",
"codepipeline:GetPipelineState",
"codepipeline:ListPipelines",
"codepipeline:ListPipelineExecutions",
"codepipeline:ListActionTypes",
"codepipeline:ListTagsForResource"
],
"Resource": "*"
},
{
"Sid": "CodeBuildReadOnly",
"Effect": "Allow",
"Action": [
"codebuild:BatchGetBuilds",
"codebuild:ListBuilds"
],
"Resource": "*"
},
{
"Sid": "PipelineLogsReadOnly",
"Effect": "Allow",
"Action": [
"logs:GetLogEvents",
"logs:DescribeLogStreams"
],
"Resource": "*"
},
{
"Sid": "SagemakerPipelineReadOnly",
"Effect": "Allow",
"Action": [
"sagemaker:ListPipelines",
"sagemaker:ListPipelineExecutions",
"sagemaker:ListPipelineExecutionSteps",
"sagemaker:DescribePipeline",
"sagemaker:DescribePipelineExecution",
"sagemaker:DescribePipelineDefinitionForExecution",
"sagemaker:GetSearchSuggestions"
],
"Resource": "*"
}
]
"Sid": "CodePipelineReadOnly"
Grants read-only access to all AWS CodePipeline resources across the account. Uses Resource: * intentionally â auditors and governance teams need account-wide visibility across all tenants to perform compliance reviews.
Policy Action |
Description |
|---|---|
codepipeline:GetPipeline |
View the detailed definition and structure of a pipeline. |
codepipeline:GetPipelineExecution |
View the status and details of a specific execution instance. |
codepipeline:GetPipelineState |
Monitor the real-time status of each stage and action within a pipeline. |
codepipeline:ListPipelines |
List all available pipelines in the account across all tenants. |
codepipeline:ListPipelineExecutions |
View the history of all past and current pipeline runs. |
codepipeline:ListActionTypes |
See what types of actions (e.g., Build, Deploy, Test) are available for use. |
codepipeline:ListTagsForResource |
Review metadata tags used for cost tracking and organizational governance. |
"Sid": "CodeBuildReadOnly"
Grants read-only access to all AWS CodeBuild projects across the account. Uses Resource: * for the same governance reason â auditors need visibility into build jobs across all tenants.
Policy Action |
Description |
|---|---|
codebuild:BatchGetBuilds |
View details of specific build jobs triggered by pipelines. |
codebuild:ListBuilds |
List all build jobs for visibility into build history across all tenants. |
"Sid": "PipelineLogsReadOnly"
Grants read-only access to CloudWatch Logs for reviewing pipeline and build execution logs. Uses Resource: * because log group names are generated by AWS services at runtime and do not follow a predictable naming pattern.
Policy Action |
Description |
|---|---|
logs:GetLogEvents |
Access execution logs for audit trails and compliance reviews. |
logs:DescribeLogStreams |
List available log streams to locate relevant log output. |
"Sid": "SagemakerPipelineReadOnly"
Grants read-only access to all SageMaker Pipelines across the account. Uses Resource: * intentionally â auditors need to review ML pipeline configurations, execution history, and step-level details across all tenants for compliance verification.
Policy Action |
Description |
|---|---|
sagemaker:ListPipelines |
List all available pipelines in the account to provide a high-level overview for audit purposes. |
sagemaker:ListPipelineExecutions |
View the history of all pipeline runs, allowing auditors to see when and how many times a workflow was triggered. |
sagemaker:ListPipelineExecutionSteps |
Examine the individual steps within a specific execution to verify that each stage (e.g., training, processing) completed as expected. |
sagemaker:DescribePipeline |
Retrieve the metadata and configuration of a pipeline definition to review its architectural design. |
sagemaker:DescribePipelineExecution |
View the current status (e.g., Succeeded, Failed) and specific details of a single execution run. |
sagemaker:DescribePipelineDefinitionForExecution |
Access the exact version of the pipeline definition used for a specific historical run, ensuring the âas-runâ configuration is verifiable. |
sagemaker:GetSearchSuggestions |
Use autocomplete/suggestions when searching for SageMaker resources. |
Resource Scope:
All four Sids use Resource: * intentionally. Read-only governance access requires account-wide visibility across all tenants â auditors must be able to review any teamâs pipelines, builds, and execution logs to perform compliance assessments. This is consistent with how AWS managed policies like ReadOnlyAccess and AWSCloudTrail_ReadOnlyAccess are designed.
Compliance Use Case: In regulated industries, auditors must verify that ML pipelines include required validation steps (data quality checks, bias detection, model explainability). Read-only access enables these reviews without risk of accidental modifications.
Level 2: project-dev¶
Purpose: Human developers creating and managing ML pipelines manually. This policy grants the ability to build, iterate on, and execute both CI/CD delivery pipelines (CodePipeline/CodeBuild) and ML workflow pipelines (SageMaker) within a tenant-scoped boundary.
Typical Users:
Data scientists (manual pipeline runs and experimentation)
ML engineers (pipeline development and iteration)
Research teams (prototyping ML workflows)
Assignment: IAM Users only (not roles)
How It Differs from read-only:
â Adds CodePipeline write actions â create, update, start, stop, retry pipelines (read-only has view-only access)
â Adds CodeBuild write actions â start and stop builds (read-only can only view build results)
â Adds SageMaker Pipeline write actions â create, update, start, stop executions (read-only can only view pipeline state)
â Adds
codepipeline:TagResourceâ organize pipeline resources with metadata tagsđ Tightens resource scoping â CodePipeline, CodeBuild, and SageMaker are scoped to
{company_prefix}-{env}-{tenant_id}-*(read-only usesResource: *for account-wide governance visibility)â Removes
sagemaker:DescribePipelineDefinitionForExecutionandsagemaker:GetSearchSuggestionsâ these are governance/audit actions not needed for active development
What You Can Do (CI/CD pipelines):
â Create and update CodePipeline definitions for your project
â Start and stop pipeline executions manually
â Retry failed stages during development
â Start and monitor CodeBuild projects
â View build logs for debugging
â Tag pipeline resources for organization
What You Can Do (SageMaker pipelines):
â Create and update SageMaker Pipeline definitions
â Start and stop pipeline executions manually
â View execution history, step details, and pipeline configurations
â Iterate on pipeline design with different parameters
What You Cannot Do:
â No Deletion:
codepipeline:DeletePipelineandsagemaker:DeletePipelineare excluded â pipelines are removed through admin-level access only, protecting execution history and audit trails.â No Cross-Tenant Access: Resource scoping limits access to pipelines matching
{company_prefix}-{env}-{tenant_id}-*, preventing access to other teamsâ workflows.â No Platform-Wide Settings: Cannot modify account-level CodePipeline or SageMaker configurations.
Example Scenario:
Tom is an ML engineer developing a new fraud detection pipeline. He creates a SageMaker Pipeline definition in Python, triggers training runs with different hyperparameters, and monitors execution steps â while the CodePipeline he set up automatically rebuilds the pipeline on each commit to his feature branch.
Sample Permissions:
[
{
"Sid": "CodePipelineDevAccess",
"Effect": "Allow",
"Action": [
"codepipeline:CreatePipeline",
"codepipeline:UpdatePipeline",
"codepipeline:GetPipeline",
"codepipeline:GetPipelineExecution",
"codepipeline:GetPipelineState",
"codepipeline:ListPipelines",
"codepipeline:ListPipelineExecutions",
"codepipeline:ListActionTypes",
"codepipeline:ListTagsForResource",
"codepipeline:StartPipelineExecution",
"codepipeline:StopPipelineExecution",
"codepipeline:RetryStageExecution",
"codepipeline:TagResource"
],
"Resource": "arn:aws:codepipeline:*:*:{company_prefix}-{env}-{tenant_id}-*"
},
{
"Sid": "CodeBuildDevAccess",
"Effect": "Allow",
"Action": [
"codebuild:StartBuild",
"codebuild:StopBuild",
"codebuild:BatchGetBuilds",
"codebuild:ListBuilds"
],
"Resource": "arn:aws:codebuild:*:*:project/{company_prefix}-{env}-{tenant_id}-*"
},
{
"Sid": "PipelineLogsAccess",
"Effect": "Allow",
"Action": [
"logs:GetLogEvents",
"logs:DescribeLogStreams"
],
"Resource": "*"
},
{
"Sid": "SagemakerPipelineDevAccess",
"Effect": "Allow",
"Action": [
"sagemaker:CreatePipeline",
"sagemaker:UpdatePipeline",
"sagemaker:DescribePipeline",
"sagemaker:StartPipelineExecution",
"sagemaker:StopPipelineExecution",
"sagemaker:DescribePipelineExecution",
"sagemaker:ListPipelineExecutions",
"sagemaker:ListPipelineExecutionSteps"
],
"Resource": "arn:aws:sagemaker:*:*:pipeline/{company_prefix}-{env}-{tenant_id}-*"
}
]
"Sid": "CodePipelineDevAccess"
Grants development-level access to AWS CodePipeline, scoped to pipelines matching the tenantâs naming convention. This ensures developers can only manage their own teamâs CI/CD pipelines.
Pipeline Management
Policy Action |
Description |
|---|---|
codepipeline:CreatePipeline |
Create new CI/CD pipeline definitions for the project. |
codepipeline:UpdatePipeline |
Modify existing pipeline stages, actions, and configurations. |
codepipeline:StartPipelineExecution |
Manually trigger a pipeline run. |
codepipeline:StopPipelineExecution |
Cancel a running pipeline execution. |
codepipeline:RetryStageExecution |
Re-run a failed stage without restarting the entire pipeline. |
codepipeline:TagResource |
Add or update metadata tags on pipeline resources. |
Read/Monitor Actions
Policy Action |
Description |
|---|---|
codepipeline:GetPipeline |
View the detailed definition and structure of a pipeline. |
codepipeline:GetPipelineExecution |
View the status and details of a specific execution. |
codepipeline:GetPipelineState |
Monitor the real-time status of each stage and action. |
codepipeline:ListPipelines |
List all available pipelines in the account. |
codepipeline:ListPipelineExecutions |
View the history of all pipeline runs. |
codepipeline:ListActionTypes |
See what types of actions (Build, Deploy, Test) are available. |
codepipeline:ListTagsForResource |
Review metadata tags for cost tracking and governance. |
"Sid": "CodeBuildDevAccess"
Grants development-level access to AWS CodeBuild projects, scoped to build projects matching the tenantâs naming convention.
Policy Action |
Description |
|---|---|
codebuild:StartBuild |
Trigger a CodeBuild project to compile, test, or package code. |
codebuild:StopBuild |
Cancel a running build job. |
codebuild:BatchGetBuilds |
View details of specific build jobs triggered by the pipeline. |
codebuild:ListBuilds |
List all build jobs for visibility into build history. |
"Sid": "PipelineLogsAccess"
Grants read access to CloudWatch Logs for debugging pipeline and build execution. This Sid uses Resource: * because log group names are generated by CodePipeline and CodeBuild at runtime and do not follow a predictable tenant-scoped naming pattern.
Policy Action |
Description |
|---|---|
logs:GetLogEvents |
Access execution logs for debugging and troubleshooting. |
logs:DescribeLogStreams |
List available log streams to locate relevant log output. |
"Sid": "SagemakerPipelineDevAccess"
Grants development-level access to SageMaker ML pipelines, scoped to the tenantâs project namespace.
Pipeline Management
Policy Action |
Description |
|---|---|
sagemaker:CreatePipeline |
Define a new SageMaker Pipeline for ML workflows (training, processing, evaluation). |
sagemaker:UpdatePipeline |
Modify an existing pipeline definition to iterate on the workflow design. |
sagemaker:StartPipelineExecution |
Trigger a pipeline run with specified parameters (e.g., hyperparameters, data paths). |
sagemaker:StopPipelineExecution |
Cancel a running execution to stop compute costs or abort a misconfigured run. |
Read/Monitor Actions
Policy Action |
Description |
|---|---|
sagemaker:DescribePipeline |
View the metadata and configuration of a pipeline definition. |
sagemaker:DescribePipelineExecution |
View the status, parameters, and details of a specific execution run. |
sagemaker:ListPipelineExecutions |
View the history of all runs for a pipeline to track iteration progress. |
sagemaker:ListPipelineExecutionSteps |
Examine individual steps within an execution to identify which step failed or succeeded. |
Resource Scope:
All four Sids are tenant-scoped to {company_prefix}-{env}-{tenant_id}-*, ensuring developers can only access their own teamâs resources. The only exception is PipelineLogsAccess which uses Resource: * because CloudWatch log group names are generated by AWS services at runtime and do not follow a predictable tenant-scoped naming pattern. This is a known AWS limitation â log access can be further restricted via log group resource policies as the platform matures.
Level 3: project-ci¶
Purpose: This IAM policy provides a self-contained set of permissions for CI/CD runners (GitHub Actions, Jenkins, GitLab CI, AWS CodePipeline) to automate end-to-end ML workflows. It includes SageMaker Pipeline orchestration, container registry access, pipeline asset retrieval, configuration/secrets access, and the ability to pass execution roles to SageMaker â everything a CI/CD runner needs to function without requiring additional policy assignments.
Unlike project-dev (which targets human developers working across both CodePipeline and SageMaker), project-ci is focused on automated SageMaker Pipeline workflows with the supporting infrastructure permissions that runners need to operate independently.
Typical Users:
GitHub Actions workflows
Jenkins build jobs
GitLab CI runners
AWS CodePipeline stages
Assignment: IAM Roles only (not users)
How It Differs from project-dev:
â Removes CodePipeline/CodeBuild management actions â CI/CD runners interact with SageMaker Pipelines directly, not through CodePipeline console
â Removes CloudWatch logs read actions â runners capture logs through their own logging mechanisms (GitHub Actions logs, Jenkins console output)
â Adds S3 read access for pipeline assets â runners need to download pipeline definitions, code, and model artifacts
â Adds Secrets Manager and SSM Parameter Store read access â runners need configuration and secrets for pipeline execution
â Adds ECR push/pull access â runners build and push container images used in ML pipeline steps
â Adds
iam:PassRoleâ runners must pass execution roles to SageMaker for training and processing jobsâ Adds
sagemaker:ListPipelinesâ runners need to discover existing pipelines to decide whether to create or update
What You Can Do:
â Create and update SageMaker Pipeline definitions from code
â Discover existing pipelines to determine create vs update
â Trigger pipeline executions automatically on code merge
â Stop pipelines on failure conditions
â Monitor execution progress and report step-level status
â Download pipeline definitions and artifacts from S3
â Retrieve configuration and secrets for pipeline execution
â Authenticate to ECR and push/pull container images
â Pass execution roles to SageMaker for training and processing jobs
What You Cannot Do:
â Delete pipelines
â Access other teamsâ pipelines
â Modify platform-wide settings
â Delete container images or repositories
â Modify ECR repository policies or lifecycle rules
â Write to S3 (read-only access for pipeline assets)
â Create or modify secrets/parameters (read-only access)
â Pass roles to services other than SageMaker
Restriction |
Description |
|---|---|
No Pipeline Deletion |
|
No Cross-Tenant Access |
SageMaker actions are scoped to |
No Container Deletion |
ECR actions do not include |
No S3 Write Access |
Runners can read pipeline assets but cannot modify or delete them â pipeline definitions are managed through version control, not runner writes. |
No Secrets Modification |
Runners can read secrets and parameters but cannot create, update, or delete them â secrets management is an admin responsibility. |
No Unrestricted Role Passing |
|
Example Scenario:
A GitHub Actions workflow triggers on merge to main. It downloads the pipeline definition from S3, pulls the base training image from ECR, builds a new image with updated code, pushes it to ECR, creates/updates the SageMaker Pipeline definition, passes the SageMaker execution role, starts a training run, and monitors step-level execution status â reporting results back to the GitHub PR.
Sample Permissions:
[
{
"Sid": "SageMakerPipelineManagement",
"Effect": "Allow",
"Action": [
"sagemaker:CreatePipeline",
"sagemaker:UpdatePipeline",
"sagemaker:DescribePipeline",
"sagemaker:ListPipelines",
"sagemaker:StartPipelineExecution",
"sagemaker:StopPipelineExecution",
"sagemaker:DescribePipelineExecution",
"sagemaker:ListPipelineExecutions",
"sagemaker:ListPipelineExecutionSteps"
],
"Resource": "arn:aws:sagemaker:*:*:pipeline/{company_prefix}-{env}-{tenant_id}-*"
},
{
"Sid": "PipelineAssetAccess",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::{company_prefix}-{env}-{tenant_id}-*",
"arn:aws:s3:::{company_prefix}-{env}-{tenant_id}-*/*"
]
},
{
"Sid": "ConfigurationAndSecretsAccess",
"Effect": "Allow",
"Action": [
"secretsmanager:GetSecretValue",
"ssm:GetParameter",
"ssm:GetParameters"
],
"Resource": [
"arn:aws:secretsmanager:*:*:secret:{company_prefix}-{env}-{tenant_id}-*",
"arn:aws:ssm:*:*:parameter/{company_prefix}-{env}-{tenant_id}/*"
]
},
{
"Sid": "ContainerRegistryAccess",
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:DescribeRepositories",
"ecr:ListImages",
"ecr:DescribeImages",
"ecr:InitiateLayerUpload",
"ecr:UploadLayerPart",
"ecr:CompleteLayerUpload",
"ecr:PutImage",
"ecr:TagResource"
],
"Resource": "*"
},
{
"Sid": "PassRoleToSageMaker",
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::*:role/{company_prefix}-{env}-{tenant_id}-role-*",
"Condition": {
"StringEquals": {
"iam:PassedToService": "sagemaker.amazonaws.com"
}
}
}
]
"Sid": "SageMakerPipelineManagement"
Grants the CI/CD runner full lifecycle management of SageMaker Pipelines (except deletion), scoped to the tenantâs project namespace.
Pipeline Definition Management
Policy Action |
Description |
|---|---|
sagemaker:CreatePipeline |
Create a new SageMaker Pipeline definition from code when a pipeline doesnât yet exist. |
sagemaker:UpdatePipeline |
Update an existing pipeline definition when code is merged â the primary action for iterative CI/CD deployments. |
sagemaker:DescribePipeline |
Retrieve metadata and configuration of a pipeline definition. Required for the runner to verify the current state before applying updates. |
sagemaker:ListPipelines |
Discover existing pipelines in the tenant namespace. Required for the runner to determine whether to create a new pipeline or update an existing one. |
Pipeline Execution Management
Policy Action |
Description |
|---|---|
sagemaker:StartPipelineExecution |
Trigger a pipeline run automatically after a successful build or code merge. |
sagemaker:StopPipelineExecution |
Halt a running execution if automated tests or failure conditions are detected. |
sagemaker:DescribePipelineExecution |
View the status, parameters, and details of a specific execution run. Required for the runner to report pass/fail status back to GitHub/Jenkins. |
sagemaker:ListPipelineExecutions |
View the history of all runs for a pipeline. Required for the runner to check if a previous execution is still running before starting a new one. |
sagemaker:ListPipelineExecutionSteps |
Examine individual steps within an execution. Required for the runner to report step-level status (e.g., âtraining step failed at epoch 5â) back to the CI/CD system. |
"Sid": "PipelineAssetAccess"
Grants read-only access to S3 buckets within the tenant namespace for downloading pipeline definitions, code artifacts, and model artifacts.
Policy Action |
Description |
|---|---|
s3:GetObject |
Download pipeline definition files, code packages, and model artifacts stored in S3. |
s3:ListBucket |
List objects within the tenantâs S3 buckets to verify that required assets exist before pipeline execution. |
"Sid": "ConfigurationAndSecretsAccess"
Grants read-only access to configuration and secrets required for pipeline execution, scoped to the tenant namespace.
Policy Action |
Description |
|---|---|
secretsmanager:GetSecretValue |
Retrieve sensitive data (API keys, database credentials, external service tokens) needed during pipeline execution. |
ssm:GetParameter |
Read a single configuration parameter (e.g., model hyperparameters, feature store endpoints). |
ssm:GetParameters |
Read multiple configuration parameters in a single call for efficient pipeline initialization. |
"Sid": "ContainerRegistryAccess"
Enables the CI/CD runner to authenticate with ECR, pull base images, build new images, and push them to the registry for use in ML pipeline steps.
Authentication
Policy Action |
Description |
|---|---|
ecr:GetAuthorizationToken |
Retrieve a temporary authentication token to authenticate the Docker CLI to the registry. |
Read/Pull Actions
Policy Action |
Description |
|---|---|
ecr:BatchCheckLayerAvailability |
Check if specific image layers already exist in the repository (used during both pull and push). |
ecr:GetDownloadUrlForLayer |
Retrieve a URL to download a specific image layer for pulling base images. |
ecr:BatchGetImage |
Retrieve image manifests for pulling base images used in pipeline steps. |
ecr:DescribeRepositories |
View repository metadata to verify target repositories exist before pushing. |
ecr:ListImages |
List images in a repository to check for existing tags and avoid unnecessary rebuilds. |
ecr:DescribeImages |
View image metadata (size, push date, tags) for build cache optimization. |
Write/Push Actions
Policy Action |
Description |
|---|---|
ecr:InitiateLayerUpload |
Start the multi-step process of uploading a new image layer. |
ecr:UploadLayerPart |
Upload a segment of an image layer during the push process. |
ecr:CompleteLayerUpload |
Finalize the layer upload, confirming all parts have been received. |
ecr:PutImage |
Push the image manifest to the repository, making the complete container image available for use in pipeline steps. |
ecr:TagResource |
Add or update metadata tags on repositories (e.g., build number, commit SHA). |
"Sid": "PassRoleToSageMaker"
Permits the CI/CD runner to pass an execution role to SageMaker so that pipeline steps (training jobs, processing jobs, transform jobs) have the compute permissions they need.
Policy Action |
Description |
|---|---|
iam:PassRole |
Assign a specific service role to the SageMaker Pipeline being created or executed. Conditioned to |
GitHub Actions Example:
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/acme-dev-a001-role-ci-runner
aws-region: us-west-2
- name: Login to ECR
run: |
aws ecr get-login-password --region us-west-2 | \
docker login --username AWS --password-stdin \
123456789012.dkr.ecr.us-west-2.amazonaws.com
- name: Build and Push Training Image
run: |
docker build -t acme-dev-a001-ml-training:${{ github.sha }} .
docker tag acme-dev-a001-ml-training:${{ github.sha }} \
123456789012.dkr.ecr.us-west-2.amazonaws.com/acme-dev-a001-ml-training:${{ github.sha }}
docker push \
123456789012.dkr.ecr.us-west-2.amazonaws.com/acme-dev-a001-ml-training:${{ github.sha }}
- name: Deploy SageMaker Pipeline
run: |
python deploy_pipeline.py \
--pipeline-name acme-dev-a001-fraud-detection \
--role-arn ${{ secrets.SAGEMAKER_ROLE_ARN }} \
--image-uri 123456789012.dkr.ecr.us-west-2.amazonaws.com/acme-dev-a001-ml-training:${{ github.sha }}
- name: Start Pipeline Execution
run: |
aws sagemaker start-pipeline-execution \
--pipeline-name acme-dev-a001-fraud-detection \
--pipeline-parameters '[{"Name":"ImageUri","Value":"'$IMAGE_URI'"}]'
Compliance Note: In regulated industries, auditors can trace:
Human actions â IAM user CloudTrail logs (project-dev)
Automated actions â IAM role CloudTrail logs (project-ci)
This separation satisfies SOC2, HIPAA, and PCI-DSS requirements.
Level 4: project-full¶
â ïž Reference Pattern â Not Generated by sec-provisioner
This policy requires a specific project name in the resource ARN (e.g.,
fraud-detection,recommendation-engine). Since project names are not known at platform provisioning time, this policy is not generated by the sec-provisioner. It is documented here as a reference pattern to be applied during project onboarding when the project name is known.
Purpose: Full pipeline control for human team members working on a specific ML project
Principal: Human (project engineers and data scientists)
Typical Users:
ML engineers (project-focused)
Data scientists (running experiments)
Project teams (isolated access)
Assignment: Attached to project-specific IAM groups created during project onboarding
How It Differs from project-ci (Level 3):
Same resource scope â both are scoped to a single projectâs pipelines
Different principal â project-ci is for automated CI/CD runners, project-full is for humans
Removes runner-specific actions â no
iam:PassRole, no SSM/Secrets access, no S3/ECR asset accessAdds interactive debugging â
ListPipelineExecutionStepsfor step-level troubleshootingAdds discovery â
ListPipelinesfor humans to browse their projectâs pipelinesAdds explicit Deny â
DenyCriticalActionsSid blockssagemaker:DeletePipelineas a safety net (runners donât need this because they never have delete in their Allow)
What You Can Do:
â Create and update pipelines for your project
â Start and stop pipeline executions
â View execution logs, metrics, and step-level details
â List and discover your projectâs pipelines
What You Cannot Do:
â Access other teamsâ pipelines
â Modify shared/platform pipelines
â Delete pipelines (explicit Deny)
â Pass IAM roles or access secrets (those are runner concerns)
Example Scenario:
The fraud-detection team needs to run their training pipeline without accessing the recommendation-engine teamâs pipelines. Engineers create, update, and monitor pipelines interactively â but deletion requires a platform admin.
Resource Scope:
All SageMaker resources are scoped to the project name within the tenantâs naming convention. Replace {project} with the actual project name at onboarding time.
arn:aws:sagemaker:*:*:pipeline/{company_prefix}-{env}-{project}-*
Example (for Edge Corp, prod environment, fraud-detection project):
arn:aws:sagemaker:*:*:pipeline/edge-prod-fraud-detection-*
Sample Permissions:
[
{
"Sid": "PipelineManagement",
"Effect": "Allow",
"Action": [
"sagemaker:CreatePipeline",
"sagemaker:UpdatePipeline"
],
"Resource": "arn:aws:sagemaker:*:*:pipeline/edge-prod-fraud-detection-*"
},
{
"Sid": "PipelineExecution",
"Effect": "Allow",
"Action": [
"sagemaker:StartPipelineExecution",
"sagemaker:StopPipelineExecution"
],
"Resource": "arn:aws:sagemaker:*:*:pipeline/edge-prod-fraud-detection-*"
},
{
"Sid": "MonitoringAndVisibility",
"Effect": "Allow",
"Action": [
"sagemaker:DescribePipeline",
"sagemaker:DescribePipelineExecution",
"sagemaker:ListPipelines",
"sagemaker:ListPipelineExecutions",
"sagemaker:ListPipelineExecutionSteps"
],
"Resource": "arn:aws:sagemaker:*:*:pipeline/edge-prod-fraud-detection-*"
},
{
"Sid": "PipelineLogsAccess",
"Effect": "Allow",
"Action": [
"logs:GetLogEvents",
"logs:DescribeLogStreams"
],
"Resource": "*"
},
{
"Sid": "DenyCriticalActions",
"Effect": "Deny",
"Action": [
"sagemaker:DeletePipeline"
],
"Resource": "*"
}
]
"Sid": "PipelineManagement"
Allows users to build and modify project-specific ML workflows. Restricted via the Resource ARN to only pipelines matching the projectâs naming convention, preventing interference with other teams.
Policy Action |
Description |
|---|---|
sagemaker:CreatePipeline |
Define a new sequence of ML steps (data prep, training, etc.) for this project. |
sagemaker:UpdatePipeline |
Modify existing pipeline definitions as project requirements evolve. |
"Sid": "PipelineExecution"
Grants operational control to run or halt project experiments. Ensures data scientists can iterate on models without requiring admin intervention.
Policy Action |
Description |
|---|---|
sagemaker:StartPipelineExecution |
Triggers a new run of the ML pipeline using specified data or parameters. |
sagemaker:StopPipelineExecution |
Allows engineers to manually kill a run if errors are detected, saving compute costs. |
"Sid": "MonitoringAndVisibility"
Provides read access for interactive debugging and troubleshooting. Unlike project-ci (which only needs execution-level status), humans need step-level detail and pipeline discovery to work effectively.
Policy Action |
Description |
|---|---|
sagemaker:DescribePipeline |
Retrieves pipeline metadata including ARN, name, creation time, status, and associated IAM identity. |
sagemaker:DescribePipelineExecution |
Returns details about a specific execution such as ARN, status, creation time, and failure reasons. |
sagemaker:ListPipelines |
Discover all pipelines within the project scope. Humans need this to browse and select pipelines interactively. |
sagemaker:ListPipelineExecutions |
View the history of all runs for a pipeline. Lists execution summaries for troubleshooting and tracking. |
sagemaker:ListPipelineExecutionSteps |
Inspect individual steps within an execution. Essential for humans debugging which step failed and why. |
"Sid": "PipelineLogsAccess"
Separated from MonitoringAndVisibility because CloudWatch log group names are generated by AWS at runtime and cannot be scoped to a project prefix. Uses Resource: * out of necessity, not by choice.
Policy Action |
Description |
|---|---|
logs:GetLogEvents |
Retrieves log events from a CloudWatch Logs log stream, allowing filtering by time range. |
logs:DescribeLogStreams |
Lists log streams within a log group, with options to filter by prefix or order by last event time. |
"Sid": "DenyCriticalActions"
Explicit safety net to prevent accidental or unauthorized deletion. An explicit Deny always overrides an Allow in IAM, ensuring that no other policy â including any future policy changes â can grant deletion rights to this group.
Policy Action |
Description |
|---|---|
sagemaker:DeletePipeline |
Specifically blocked to ensure that even project members cannot permanently remove pipeline infrastructure. Deletion is reserved for Level 5: platform-full. |
Level 5: platform-full¶
Purpose: Platform-wide pipeline management across all projects and tenants
Principal: Human (platform administrators)
Typical Users:
MLOps platform team
Pipeline infrastructure owners
Cross-project coordinators
Assignment: Platform admin IAM groups (e.g., {company_prefix}-{env}-group-platform-admins)
How It Differs from project-full (Level 4):
Scope breaks out from project to account-wide â
Resource: *instead of project-scoped ARNsAdds delete â
sagemaker:DeletePipelineandcodepipeline:DeletePipeline(only level that can delete)Adds CodePipeline management â Levels 1-4 focus on SageMaker Pipelines; Level 5 adds full CI/CD pipeline control
Adds governance actions â approval gates, stage transitions, pipeline freezing
Adds PassRole â can assign IAM roles to pipelines (scoped to SageMaker and CodePipeline services)
No explicit Deny â this is the level where delete is intentionally allowed
What You Can Do:
â Manage all SageMaker pipelines across all projects and tenants
â Manage all CodePipeline CI/CD pipelines across the platform
â Create shared/platform pipelines
â Delete obsolete pipelines (SageMaker and CodePipeline)
â Approve/reject deployment gates
â Freeze and unfreeze pipeline stages
â Assign IAM roles to pipelines
What You Cannot Do:
â Nothing â full pipeline access across both SageMaker and CodePipeline
Example Scenario:
The MLOps team maintains a shared data preprocessing pipeline used by all ML projects and needs to update it with new validation steps. They also need to decommission a retired projectâs pipelines and approve a production deployment gate.
Resource Scope:
Account-wide â no tenant or project scoping. Platform admins need cross-cutting access to manage the entire pipeline infrastructure.
Resource: "*"
Sample Permissions:
[
{
"Sid": "SageMakerPipelineFullAccess",
"Effect": "Allow",
"Action": [
"sagemaker:CreatePipeline",
"sagemaker:UpdatePipeline",
"sagemaker:DeletePipeline",
"sagemaker:DescribePipeline",
"sagemaker:ListPipelines",
"sagemaker:StartPipelineExecution",
"sagemaker:StopPipelineExecution",
"sagemaker:DescribePipelineExecution",
"sagemaker:ListPipelineExecutions",
"sagemaker:ListPipelineExecutionSteps"
],
"Resource": "*"
},
{
"Sid": "CodePipelineFullAccess",
"Effect": "Allow",
"Action": [
"codepipeline:CreatePipeline",
"codepipeline:UpdatePipeline",
"codepipeline:DeletePipeline",
"codepipeline:GetPipeline",
"codepipeline:ListPipelines",
"codepipeline:GetPipelineState",
"codepipeline:GetPipelineExecution",
"codepipeline:StartPipelineExecution",
"codepipeline:StopPipelineExecution",
"codepipeline:RetryStageExecution",
"codepipeline:RollbackStage"
],
"Resource": "*"
},
{
"Sid": "PipelineGovernance",
"Effect": "Allow",
"Action": [
"codepipeline:PutApprovalResult",
"codepipeline:DisableStageTransition",
"codepipeline:EnableStageTransition"
],
"Resource": "*"
},
{
"Sid": "PipelineLogsFullAccess",
"Effect": "Allow",
"Action": [
"logs:GetLogEvents",
"logs:DescribeLogStreams",
"logs:DescribeLogGroups",
"logs:FilterLogEvents"
],
"Resource": "*"
},
{
"Sid": "PassRoleToPipelineServices",
"Effect": "Allow",
"Action": [
"iam:PassRole"
],
"Resource": "arn:aws:iam::*:role/{company_prefix}-{env}-*-role-*",
"Condition": {
"StringEquals": {
"iam:PassedToService": [
"sagemaker.amazonaws.com",
"codepipeline.amazonaws.com"
]
}
}
}
]
"Sid": "SageMakerPipelineFullAccess"
Full control over all SageMaker Pipelines across every project and tenant. This is the only level that includes sagemaker:DeletePipeline â all lower levels either omit it or explicitly deny it. Platform admins use this to manage the complete lifecycle of ML pipelines including decommissioning retired projects.
Policy Action |
Description |
|---|---|
sagemaker:CreatePipeline |
Create new SageMaker pipeline definitions for any project or shared infrastructure. |
sagemaker:UpdatePipeline |
Modify any existing pipeline definition across the platform. |
sagemaker:DeletePipeline |
Permanently remove obsolete or retired pipelines. Only available at this level. |
sagemaker:DescribePipeline |
Retrieve metadata for any pipeline including ARN, status, and associated IAM identity. |
sagemaker:ListPipelines |
Discover all pipelines across the entire account for cross-project visibility. |
sagemaker:StartPipelineExecution |
Trigger execution of any pipeline for cross-project coordination or incident response. |
sagemaker:StopPipelineExecution |
Halt any running pipeline execution across the platform. |
sagemaker:DescribePipelineExecution |
Inspect execution details including status, timing, and failure reasons for any pipeline. |
sagemaker:ListPipelineExecutions |
View execution history across all pipelines for platform-wide monitoring. |
sagemaker:ListPipelineExecutionSteps |
Inspect step-level details within any execution for deep troubleshooting. |
"Sid": "CodePipelineFullAccess"
Full control over all CodePipeline CI/CD pipelines. This extends platform-full beyond SageMaker into the CI/CD layer, giving the MLOps team end-to-end pipeline management from source code through to model deployment.
Policy Action |
Description |
|---|---|
codepipeline:CreatePipeline |
Create new CI/CD pipelines for any project or shared infrastructure. |
codepipeline:UpdatePipeline |
Modify the structure or settings of any existing pipeline. |
codepipeline:DeletePipeline |
Permanently remove obsolete CI/CD pipeline configurations. |
codepipeline:GetPipeline |
View the JSON structure and configuration of any pipeline. |
codepipeline:ListPipelines |
List all CI/CD pipelines in the account for platform-wide visibility. |
codepipeline:GetPipelineState |
Real-time view of stage and action status (Succeeded, In Progress, Failed). |
codepipeline:GetPipelineExecution |
View details and history of a specific execution run. |
codepipeline:StartPipelineExecution |
Manually trigger any pipeline for cross-project coordination. |
codepipeline:StopPipelineExecution |
Force-stop a running pipeline mid-process. |
codepipeline:RetryStageExecution |
Restart a failed stage without rerunning the entire pipeline. |
codepipeline:RollbackStage |
Revert a stage to a previous successful state for incident recovery. |
"Sid": "PipelineGovernance"
Governance actions for deployment control. Allows platform admins to approve or reject deployment gates, freeze pipeline stages during incidents, and resume flow when resolved. Separated from CodePipelineFullAccess because these are administrative/governance actions, not pipeline CRUD.
Policy Action |
Description |
|---|---|
codepipeline:PutApprovalResult |
Approve or reject a manual approval gate to move a deployment forward. |
codepipeline:DisableStageTransition |
Freeze a pipeline stage to prevent progression (e.g., during an incident or change freeze). |
codepipeline:EnableStageTransition |
Re-enable flow between stages after a freeze is lifted. |
"Sid": "PipelineLogsFullAccess"
Full CloudWatch Logs access for platform-wide pipeline troubleshooting. Adds DescribeLogGroups and FilterLogEvents beyond what lower levels have â platform admins need to discover log groups across all projects and search across log streams.
Policy Action |
Description |
|---|---|
logs:GetLogEvents |
Retrieve log events from any pipelineâs log stream. |
logs:DescribeLogStreams |
List log streams within any log group for cross-project investigation. |
logs:DescribeLogGroups |
Discover all log groups across the account â needed for platform-wide visibility. |
logs:FilterLogEvents |
Search across log streams within a log group â essential for incident investigation across projects. |
"Sid": "PassRoleToPipelineServices"
Allows platform admins to assign IAM roles to both SageMaker and CodePipeline services. Scoped to roles matching the platformâs naming convention and conditioned to only pass roles to pipeline services â prevents using this permission to escalate privileges to other AWS services.
Policy Action |
Description |
|---|---|
iam:PassRole |
Assign IAM service roles to pipelines. Scoped to |
Inference Policies¶
Inference policies control access to deployed ML models and prediction services. Unlike S3, ECR, or Pipeline policies which each target a single AWS service, inference spans multiple services â each with its own permission model and use cases.
Service |
Use Case |
Levels |
|---|---|---|
SageMaker Inference |
Real-time endpoints, batch transform, async/serverless inference, autoscaling |
4 |
Lambda Inference |
Lightweight model serving, custom inference containers, event-driven predictions |
3 |
Bedrock Inference |
Foundation model invocation, cross-region inference, provisioned throughput |
3 |
Each service gets the number of levels its permission model actually needs â no artificial uniformity.
SageMaker Inference¶
SageMaker Inference policies control access to deployed ML models and endpoints. The level progression is invoke-centric: who can call the model, and in which environment.
Level 1: read-only¶
Purpose: This IAM policy provides read-only access for monitoring the health of SageMaker endpoints without granting permissions to invoke predictions (no inference costs)
Principal: Human (auditors, monitoring teams)
Typical Users:
Compliance auditors
QA teams
Monitoring dashboards
Cost optimization analysts
New team members learning the platform
What You Can Do:
â View endpoint status and health
â List all endpoints and configurations
â See endpoint metadata (instance type, model version)
â Monitor endpoint metrics (latency, error rates)
â Check autoscaling settings
What You Cannot Do:
â Invoke endpoints (send prediction requests)
â Create or modify endpoints
â Delete endpoints
Example Scenario:
Sarah is a QA engineer who needs to verify that all production endpoints are using the approved instance types and have autoscaling enabled. She needs to see endpoint configurations but doesnât need to send prediction requests.
Sample Permissions:
[
{
"Sid": "SageMakerEndpointReadOnly",
"Effect": "Allow",
"Action": [
"sagemaker:ListEndpoints",
"sagemaker:DescribeEndpoint",
"sagemaker:ListEndpointConfigs",
"sagemaker:DescribeEndpointConfig",
"sagemaker:ListModels",
"sagemaker:DescribeModel",
"sagemaker:DescribeModelPackage",
"sagemaker:ListModelPackages"
],
"Resource": "*"
},
{
"Sid": "CloudWatchMetricsReadOnly",
"Effect": "Allow",
"Action": [
"cloudwatch:GetMetricData",
"cloudwatch:GetMetricStatistics",
"cloudwatch:ListMetrics"
],
"Resource": "*"
},
{
"Sid": "AutoScalingReadOnly",
"Effect": "Allow",
"Action": [
"application-autoscaling:DescribeScalableTargets",
"application-autoscaling:DescribeScalingPolicies"
],
"Resource": "*"
},
{
"Sid": "ExplicitDenyInference",
"Effect": "Deny",
"Action": [
"sagemaker:InvokeEndpoint",
"sagemaker:InvokeEndpointAsync"
],
"Resource": "*"
}
]
"Sid": "SageMakerEndpointReadOnly"
Grants view-only access to SageMaker hosting resources, allowing the user to see lists and configurations for models and endpoints.
Policy Action |
Description |
|---|---|
sagemaker:ListEndpoints |
Returns a list of all existing endpoints in your account, including their names and ARNs. |
sagemaker:DescribeEndpoint |
Returns detailed information about a specific endpoint (e.g., status, configuration name). |
sagemaker:ListEndpointConfigs |
Lists all endpoint configurations (the blueprints for endpoints). |
sagemaker:DescribeEndpointConfig |
Returns details of an endpoint configuration, such as instance types and model names. |
sagemaker:ListModels |
Lists all models currently created in SageMaker. |
sagemaker:DescribeModel |
Returns details about a model, including the container image and execution role. |
sagemaker:DescribeModelPackage |
Provides information about a specific versioned model package. |
sagemaker:ListModelPackages |
Lists all available model packages or groups. |
"Sid": "CloudWatchMetricsReadOnly"
Provides access to view performance data and statistics for monitoring the health and usage of the resources.
Policy Action |
Description |
|---|---|
cloudwatch:GetMetricData |
Retrieves raw data points for various metrics across multiple resources. |
cloudwatch:GetMetricStatistics |
Gets specific statistical data (Average, Sum, Max, etc.) for a metric. |
cloudwatch:ListMetrics |
Lists all the valid metric names available to be viewed. |
"Sid": "AutoScalingReadOnly"
Allows the user to see the scaling configurations and policies applied to the endpoints.
Policy Action |
Description |
|---|---|
application-autoscaling:DescribeScalableTargets |
Shows which resources (endpoints) are set up to scale automatically. |
application-autoscaling:DescribeScalingPolicies |
Shows the specific rules that trigger a scale-up or scale-down event. |
"Sid": "ExplicitDenyInference"
Specifically blocks the ability to actually run data through an endpoint for predictions, ensuring the policy remains âread-only.â
Policy Action |
Description |
|---|---|
sagemaker:InvokeEndpoint |
(Denied) The action required to send a synchronous request to an endpoint for a prediction. |
sagemaker:InvokeEndpointAsync |
(Denied) The action required to send an asynchronous request for long-running inferences. |
Cost Benefit:
No InvokeEndpoint permission means no inference charges - perfect for monitoring and audit use cases.
Level 1-prod: read-only-invoke¶
Purpose: Read-only monitoring access with production invoke permissions. Designed for non-technical users who consume model predictions through dashboards and applications.
Principal: Human (business consumers, product managers, analysts)
Typical Users:
Business consumers using ML-powered dashboards
Product managers validating model outputs
Analysts running predictions for business decisions
Applications calling endpoints on behalf of business users
How It Differs from read-only (Level 1):
Adds invoke â can send prediction requests to endpoints in all environments including production
Same read-only baseline â identical list/describe/monitor permissions
No endpoint lifecycle â cannot create, modify, or delete endpoints
What You Can Do:
â Everything in read-only (Level 1), PLUS:
â Invoke sandbox, dev, staging, and production endpoints
â Send real-time and async prediction requests
â Consume ML models through applications and dashboards
What You Cannot Do:
â Create, modify, or delete endpoints or endpoint configs
â Register or manage models
â Access training jobs or notebooks
Example Scenario:
Lisa is a product manager who uses an internal dashboard powered by a fraud detection model. The dashboard calls the production SageMaker endpoint to score transactions in real-time. Lisa needs invoke access to production but should never modify the endpoint or model behind it.
Sample Permissions:
[
{
"Sid": "SageMakerEndpointReadOnly",
"Effect": "Allow",
"Action": [
"sagemaker:ListEndpoints",
"sagemaker:DescribeEndpoint",
"sagemaker:ListEndpointConfigs",
"sagemaker:DescribeEndpointConfig",
"sagemaker:ListModels",
"sagemaker:DescribeModel",
"sagemaker:DescribeModelPackage",
"sagemaker:ListModelPackages"
],
"Resource": "*"
},
{
"Sid": "CloudWatchMetricsReadOnly",
"Effect": "Allow",
"Action": [
"cloudwatch:GetMetricData",
"cloudwatch:GetMetricStatistics",
"cloudwatch:ListMetrics"
],
"Resource": "*"
},
{
"Sid": "AutoScalingReadOnly",
"Effect": "Allow",
"Action": [
"application-autoscaling:DescribeScalableTargets",
"application-autoscaling:DescribeScalingPolicies"
],
"Resource": "*"
},
{
"Sid": "InvokeAllEnvironments",
"Effect": "Allow",
"Action": [
"sagemaker:InvokeEndpoint",
"sagemaker:InvokeEndpointAsync"
],
"Resource": "arn:aws:sagemaker:*:*:endpoint/*"
}
]
"Sid": "SageMakerEndpointReadOnly"
Identical to Level 1. Grants view-only access to SageMaker hosting resources.
"Sid": "CloudWatchMetricsReadOnly"
Identical to Level 1. Provides access to view performance data and statistics.
"Sid": "AutoScalingReadOnly"
Identical to Level 1. Allows the user to see scaling configurations.
"Sid": "InvokeAllEnvironments"
Allows sending prediction requests to endpoints in all environments (sandbox, dev, staging, production). Scoped to endpoint/* â no access to endpoint configs, models, or training resources.
Policy Action |
Description |
|---|---|
sagemaker:InvokeEndpoint |
Sends a real-time inference request to any running endpoint to get a prediction. |
sagemaker:InvokeEndpointAsync |
Sends an inference request to any asynchronous endpoint (used for large payloads or long processing times). |
Key Difference from Level 1: No ExplicitDenyInference Sid. Instead, invoke is explicitly allowed across all environments. The read-only baseline remains identical.
Cost Consideration: Unlike Level 1, this level incurs inference costs per invocation. Use API throttling and service quotas to manage cost risk rather than IAM restrictions.
Level 2: dev-invoke¶
Purpose: Test models in development and staging environments
Principal: Human (data scientists, ML engineers)
Typical Users:
Data scientists (A/B testing)
ML engineers (model validation)
QA teams (integration testing)
Development applications
How It Differs from read-only (Level 1):
Adds invoke â can send prediction requests to dev/staging endpoints
Adds model registration â can register trained models in SageMaker Model Registry
Environment-scoped â restricted to
{prefix}-dev-*and{prefix}-staging-*endpointsSandbox/dev/staging endpoint lifecycle â can create, modify, and delete endpoints in non-production environments
Production endpoints blocked â explicit deny on
*-prod-*endpoints and endpoint configs
What You Can Do:
â Everything in read-only, PLUS:
â Invoke dev endpoints for testing
â Invoke staging endpoints for validation
â Send test prediction requests
â Validate model responses
â Register trained models in SageMaker Model Registry
â Create model package groups for organizing model versions
â Create model definitions (link artifacts + container)
â Create, update, and delete sandbox/dev/staging endpoints
â Create endpoint configs for non-production environments
What You Cannot Do:
â Invoke production endpoints
â Create, modify, or delete production (
*-prod-*) endpointsâ Create production endpoint configs
â Approve or reject model packages (governance responsibility)
Example Scenario:
Marcus is a data scientist who deployed two fraud detection models to the staging environment. He needs to send test transactions to both endpoints to compare their accuracy before promoting the winner to production.
Sample Permissions:
[
{
"Sid": "SageMakerReadOnlyAccess",
"Effect": "Allow",
"Action": [
"sagemaker:ListEndpoints",
"sagemaker:DescribeEndpoint",
"sagemaker:ListEndpointConfigs",
"sagemaker:DescribeEndpointConfig",
"sagemaker:ListModels",
"sagemaker:DescribeModel",
"sagemaker:DescribeModelPackage",
"sagemaker:ListModelPackages"
"sagemaker:GetSearchSuggestions"
],
"Resource": "*"
},
{
"Sid": "SageMakerInvokeDevStagingEndpoints",
"Effect": "Allow",
"Action": [
"sagemaker:InvokeEndpoint",
"sagemaker:InvokeEndpointAsync"
],
"Resource": [
"arn:aws:sagemaker:*:*:endpoint/*-sandbox-*",
"arn:aws:sagemaker:*:*:endpoint/*-dev-*",
"arn:aws:sagemaker:*:*:endpoint/*-staging-*"
]
},
{
"Sid": "EndpointLifecycleNonProd",
"Effect": "Allow",
"Action": [
"sagemaker:CreateEndpoint",
"sagemaker:CreateEndpointConfig",
"sagemaker:UpdateEndpoint",
"sagemaker:DeleteEndpoint",
"sagemaker:DeleteEndpointConfig"
],
"Resource": [
"arn:aws:sagemaker:*:*:endpoint/*-sandbox-*",
"arn:aws:sagemaker:*:*:endpoint/*-dev-*",
"arn:aws:sagemaker:*:*:endpoint/*-staging-*",
"arn:aws:sagemaker:*:*:endpoint-config/*-sandbox-*",
"arn:aws:sagemaker:*:*:endpoint-config/*-dev-*",
"arn:aws:sagemaker:*:*:endpoint-config/*-staging-*"
]
},
{
"Sid": "ModelRegistration",
"Effect": "Allow",
"Action": [
"sagemaker:CreateModel",
"sagemaker:CreateModelPackage",
"sagemaker:CreateModelPackageGroup"
],
"Resource": "*"
},
{
"Sid": "ExplicitDenyProductionAndLifecycle",
"Effect": "Deny",
"Action": [
"sagemaker:CreateEndpoint",
"sagemaker:CreateEndpointConfig",
"sagemaker:UpdateEndpoint",
"sagemaker:DeleteEndpoint",
"sagemaker:DeleteEndpointConfig"
],
"Resource": [
"arn:aws:sagemaker:*:*:endpoint/*-prod-*",
"arn:aws:sagemaker:*:*:endpoint-config/*-prod-*"
]
}
]
"Sid": "SageMakerReadOnlyAccess"
Grants broad permission to view and list SageMaker resources and search suggestions across the entire account.
Policy Action |
Description |
|---|---|
sagemaker:Describe |
Retrieves detailed information about a resource (e.g., training jobs, models, or endpoints). |
sagemaker:List* |
Lists resources of a specific type to see what exists in the environment. |
sagemaker:GetSearchSuggestions |
Provides auto-complete suggestions for SageMaker search queries. |
"Sid": "SageMakerInvokeDevStagingEndpoints"
Allows the user to send data to specific SageMaker endpoints named as âsandboxâ, âdevâ, or âstaging.â
Policy Action |
Description |
|---|---|
sagemaker:InvokeEndpoint |
Sends a real-time inference request to a running endpoint to get a prediction. |
sagemaker:InvokeEndpointAsync |
Sends an inference request to an asynchronous endpoint (used for large payloads or long processing times). |
"Sid": "EndpointLifecycleNonProd"
Allows creating, updating, and deleting endpoints and endpoint configurations in non-production environments (sandbox, dev, staging). This enables ML engineers and data scientists to test real-time inference latency, validate inference logic, and iterate on endpoint configurations before handing off to MLOps for production deployment.
Policy Action |
Description |
|---|---|
sagemaker:CreateEndpoint |
Creates a new endpoint using a specific endpoint configuration. Scoped to sandbox/dev/staging naming patterns. |
sagemaker:CreateEndpointConfig |
Defines the hardware (instance type, count) and model specifications for an endpoint. Scoped to non-production. |
sagemaker:UpdateEndpoint |
Deploys a new model or configuration to an existing non-production endpoint. |
sagemaker:DeleteEndpoint |
Shuts down and removes a non-production endpoint to stop incurring costs. |
sagemaker:DeleteEndpointConfig |
Removes an endpoint configuration that is no longer needed in non-production. |
"Sid": "ModelRegistration"
Allows data scientists to register trained models in the SageMaker Model Registry after training. This is a development activity â the model sits in the registry awaiting approval from MLOps or governance teams before production deployment.
Policy Action |
Description |
|---|---|
sagemaker:CreateModel |
Creates a model definition in SageMaker by specifying the Docker container image, model artifacts (from S3), and inference code. This does not deploy the model â it only defines it. |
sagemaker:CreateModelPackage |
Registers a versioned model package in the Model Registry. This is the primary action for submitting a trained model for review and approval. |
sagemaker:CreateModelPackageGroup |
Creates a model package group to organize related model versions (e.g., all versions of a fraud detection model). Typically done once per model project. |
"Sid": "ExplicitDenyProductionAndLifecycle"
A strict guardrail that blocks any interaction with production (*-prod-*) endpoints and their endpoint configurations. Non-production environments (sandbox, dev, staging) are allowed.
Policy Action Denied |
Description |
|---|---|
sagemaker:CreateEndpoint |
(Denied) Creates a new endpoint using a specific endpoint configuration. |
sagemaker:CreateEndpointConfig |
(Denied) Defines the hardware and model specifications for an endpoint. |
sagemaker:UpdateEndpoint |
(Denied) Deploys a new model or configuration to an existing endpoint. |
sagemaker:DeleteEndpoint |
(Denied) Shuts down and removes a production endpoint to stop incurring costs. |
sagemaker:DeleteEndpointConfig |
(Denied) Removes a production endpoint configuration. |
Python Example:
import boto3
import json
runtime = boto3.client('sagemaker-runtime')
# Test staging endpoint
response = runtime.invoke_endpoint(
EndpointName='acme-dev-fraud-detection-v2',
ContentType='application/json',
Body=json.dumps({
'transaction_amount': 1500.00,
'merchant_category': 'electronics'
})
)
prediction = json.loads(response['Body'].read())
print(f"Fraud probability: {prediction['fraud_score']}")
Safety Feature: Cannot accidentally invoke production endpoints during testing - prevents costly mistakes and data contamination.
Level 3: prod-invoke¶
Purpose: Production applications invoking production models
Principal: Machine (backend services, APIs) or Human (production support)
Typical Users:
Backend API services
Production web applications
Mobile app backends
Real-time fraud detection systems
Customer-facing chatbots
How It Differs from dev-invoke (Level 2):
Switches environment scope â production endpoints only, no dev/staging
Typically assigned to service roles â production apps use IAM roles, not user credentials
Higher accountability â every invocation serves real customers
What You Can Do:
â Everything in read-only, PLUS:
â Invoke production endpoints only
â Send real customer prediction requests
â Receive model responses for business logic
What You Cannot Do:
â Invoke dev or staging endpoints
â Create or modify endpoints
â Delete endpoints
Example Scenario:
The fraud detection API service receives transaction requests from the payment gateway. For each transaction, it calls the production fraud model endpoint and blocks transactions with fraud scores above 0.85.
Sample Permissions:
[
{
"Sid": "AllowProductionEndpointDiscovery",
"Effect": "Allow",
"Action": [
"sagemaker:DescribeEndpoint",
"sagemaker:ListEndpoints"
],
"Resource": "*"
},
{
"Sid": "AllowProductionModelInvocation",
"Effect": "Allow",
"Action": [
"sagemaker:InvokeEndpoint",
"sagemaker:InvokeEndpointAsync"
],
"Resource": "arn:aws:sagemaker:*:*:endpoint/*-prod-*"
},
{
"Sid": "DenyNonProductionInvocation",
"Effect": "Deny",
"Action": [
"sagemaker:InvokeEndpoint",
"sagemaker:InvokeEndpointAsync"
],
"NotResource": "arn:aws:sagemaker:*:*:endpoint/*-prod-*"
},
{
"Sid": "DenyEndpointModifications",
"Effect": "Deny",
"Action": [
"sagemaker:CreateEndpoint",
"sagemaker:UpdateEndpoint",
"sagemaker:DeleteEndpoint",
"sagemaker:CreateEndpointConfig",
"sagemaker:DeleteEndpointConfig"
],
"Resource": "*"
}
]
"Sid": "AllowProductionEndpointDiscovery"
Enables the principal to find and view the status of endpoints that follow the -prod- naming convention.
Policy Action |
Description |
|---|---|
sagemaker:DescribeEndpoint |
Returns detailed information about an endpoint, such as its current status and configuration name. |
sagemaker:ListEndpoints |
Lists the SageMaker endpoints in the account, allowing the user to see what is available. |
"Sid": "AllowProductionModelInvocation"
Grants the core permission to send data to and receive predictions from production-ready models.
Policy Action |
Description |
|---|---|
sagemaker:InvokeEndpoint |
Sends a synchronous request to an endpoint for low-latency, real-time machine learning inferences. |
sagemaker:InvokeEndpointAsync |
Sends an inference request to an asynchronous endpoint, suitable for large payloads or long processing times. |
"Sid": "DenyNonProductionInvocation"
A guardrail that explicitly prevents this principal from hitting any endpoint not explicitly tagged or named as âprod,â preventing accidental cross-environment leakage.
Policy Action Denied |
Description |
|---|---|
sagemaker:InvokeEndpoint |
(Denied) Sends a synchronous request to an endpoint for low-latency, real-time machine learning inferences. |
sagemaker:InvokeEndpointAsync |
(Denied) Sends an inference request to an asynchronous endpoint, suitable for large payloads or long processing times. |
"Sid": "DenyEndpointModifications"
Ensures the principal cannot change the infrastructure, such as deleting models or scaling configs, maintaining environment stability.
Policy Action Denied |
Description |
|---|---|
sagemaker:CreateEndpoint |
(Denied) Creates a new SageMaker endpoint using a specific endpoint configuration. |
sagemaker:UpdateEndpoint |
(Denied) Deploys a new endpoint configuration to an existing endpoint without taking it offline. |
sagemaker:DeleteEndpoint |
(Denied) Permanently removes an existing SageMaker endpoint and stops the associated hosting instances. |
sagemaker:CreateEndpointConfig |
(Denied) Defines a setup for an endpoint, specifying which models to deploy and the hardware instance types to use. |
sagemaker:DeleteEndpointConfig |
(Denied) Deletes a previously created endpoint configuration. |
API Gateway Integration:
import boto3
import json
from flask import Flask, request, jsonify
app = Flask(__name__)
runtime = boto3.client('sagemaker-runtime')
@app.route('/check-fraud', methods=['POST'])
def check_fraud():
transaction = request.json
# Call production endpoint
response = runtime.invoke_endpoint(
EndpointName='acme-prod-fraud-detection',
ContentType='application/json',
Body=json.dumps(transaction)
)
prediction = json.loads(response['Body'].read())
return jsonify({
'transaction_id': transaction['id'],
'fraud_score': prediction['fraud_score'],
'action': 'block' if prediction['fraud_score'] > 0.85 else 'approve'
})
Security Benefit: Production applications cannot call unstable dev/staging endpoints - ensures reliability and data integrity.
Level 4: full¶
Purpose: Complete endpoint lifecycle management across all environments
Principal: Human (MLOps engineers, platform admins)
Typical Users:
MLOps engineers
Platform administrators
Deployment automation (CI/CD)
Infrastructure team
How It Differs from prod-invoke (Level 3):
Adds lifecycle management â create, update, configure, and delete endpoints
Adds autoscaling control â configure scaling policies and instance counts
Account-wide scope â
Resource: *across all environmentsIncludes delete â can decommission obsolete endpoints
What You Can Do:
â Everything in prod-invoke, PLUS:
â Create new endpoints
â Update endpoint configurations
â Deploy new model versions
â Configure autoscaling policies
â Delete obsolete endpoints
â Manage endpoint tags
What You Cannot Do:
â Nothing - this is full endpoint management
Example Scenario:
The MLOps team needs to deploy a new fraud detection model to production. They create an endpoint configuration with the new model, create the endpoint with 2 instances, enable autoscaling, and gradually shift traffic from the old endpoint using blue/green deployment.
Sample Permissions:
[
{
"Sid": "SageMakerEndpointLifecycleManagement",
"Effect": "Allow",
"Action": [
"sagemaker:CreateEndpoint",
"sagemaker:CreateEndpointConfig",
"sagemaker:UpdateEndpoint",
"sagemaker:UpdateEndpointWeightsAndCapacities",
"sagemaker:DeleteEndpoint",
"sagemaker:DeleteEndpointConfig",
"sagemaker:DescribeEndpoint",
"sagemaker:DescribeEndpointConfig",
"sagemaker:ListEndpoints",
"sagemaker:ListEndpointConfigs"
],
"Resource": "*"
},
{
"Sid": "ModelManagement",
"Effect": "Allow",
"Action": [
"sagemaker:CreateModel",
"sagemaker:DescribeModel",
"sagemaker:DeleteModel",
"sagemaker:ListModels"
],
"Resource": "*"
},
{
"Sid": "InferenceExecution",
"Effect": "Allow",
"Action": [
"sagemaker:InvokeEndpoint",
"sagemaker:InvokeEndpointAsync"
],
"Resource": "*"
},
{
"Sid": "AutoscalingAndMonitoring",
"Effect": "Allow",
"Action": [
"application-autoscaling:RegisterScalableTarget",
"application-autoscaling:DeregisterScalableTarget",
"application-autoscaling:PutScalingPolicy",
"application-autoscaling:DeleteScalingPolicy",
"application-autoscaling:DescribeScalableTargets",
"application-autoscaling:DescribeScalingPolicies",
"cloudwatch:PutMetricAlarm",
"cloudwatch:DescribeAlarms",
"cloudwatch:DeleteAlarms"
],
"Resource": "*"
},
{
"Sid": "TaggingAndMetadata",
"Effect": "Allow",
"Action": [
"sagemaker:AddTags",
"sagemaker:DeleteTags",
"sagemaker:ListTags"
],
"Resource": "*"
},
{
"Sid": "PassRoleToSageMaker",
"Effect": "Allow",
"Action": [
"iam:PassRole"
],
"Resource": "arn:aws:iam::*:role/{company_prefix}-{env}-*-role-*",
"Condition": {
"StringEquals": {
"iam:PassedToService": "sagemaker.amazonaws.com"
}
}
},
{
"Sid": "ExplicitDenyDataDeletion",
"Effect": "Deny",
"Action": [
"sagemaker:DeleteDomain",
"sagemaker:DeleteUserProfile"
],
"Resource": "*"
}
]
"Sid": "SageMakerEndpointLifecycleManagement"
Controls the core lifecycle of hosting, including creating, updating, and deleting the physical endpoints and their configurations. Also includes discovery actions (List/Describe) so admins have full visibility.
Policy Action |
Description |
|---|---|
sagemaker:CreateEndpoint |
Launches the actual HTTPS endpoint based on a specific configuration. Once this action completes, the endpoint is âInServiceâ and ready to process inference requests. |
sagemaker:CreateEndpointConfig |
Defines the configuration for a model deployment. It acts as a âblueprintâ that specifies exactly how SageMaker should host your machine learning model before you actually create the live endpoint. |
sagemaker:UpdateEndpoint |
Switches an endpoint to a new configuration (e.g., rolling out a new model version). This is typically used for âBlue/Greenâ deployments to swap a model version or change instance types without downtime. |
sagemaker:UpdateEndpointWeightsAndCapacities |
Dynamically adjusts the traffic distribution and instance counts of models (production variants) hosted on an active endpoint. Unlike UpdateEndpoint, which often involves deploying a new configuration and can trigger a rolling update, this operation allows you to make âin-placeâ adjustments to existing variants without changing the underlying Endpoint Config. |
sagemaker:DeleteEndpoint |
Shuts down the hosted infrastructure (ENDPOINT) and stops incurring charges. This action does not delete the configuration or the models themselves. |
sagemaker:DeleteEndpointConfig |
Permanently removes the specified endpoint configuration blueprint. You cannot delete a configuration that is currently being used by a live or updating endpoint. |
sagemaker:DescribeEndpoint |
Views the current status and details of a live SageMaker endpoint. |
sagemaker:DescribeEndpointConfig |
Retrieves the specific settings defined in an endpoint configuration. |
sagemaker:ListEndpoints |
Lists all endpoints in the account for platform-wide visibility. |
sagemaker:ListEndpointConfigs |
Lists all endpoint configurations to browse existing blueprints. |
"Sid": "ModelManagement"
Allows the definition of the software/model artifacts that the endpoints will run.
Policy Action |
Description |
|---|---|
sagemaker:CreateModel |
Grants permission to create a model in SageMaker. This process involves naming the model and specifying the Docker container image, model artifacts (usually from S3), and inference code required for deployment. |
sagemaker:DescribeModel |
Grants permission to view the details of a specific model. This returns information about the modelâs configuration, such as the primary container, execution role, and creation time. |
sagemaker:DeleteModel |
Grants permission to delete a model resource. This action only removes the model entry in SageMaker; it does not delete the underlying model artifacts in S3 or the associated IAM roles. |
sagemaker:ListModels |
Lists all models in the account. Admins need this to discover and audit models before associating them with endpoint configurations. |
"Sid": "InferenceExecution"
Grants the ability to send data to endpoints across all environments. At Level 4, there is no environment restriction â admins need to invoke any endpoint for testing, validation, and troubleshooting.
Policy Action |
Description |
|---|---|
sagemaker:InvokeEndpoint |
Sends data to a real-time endpoint for a prediction/inference response. |
sagemaker:InvokeEndpointAsync |
Sends data to an asynchronous endpoint for inference. Unlike a real-time request, the model processes the data in the background and saves the prediction result to an S3 bucket rather than returning it immediately. |
"Sid": "AutoscalingAndMonitoring"
Manages the horizontal scaling rules (adding/removing instances) based on traffic demand.
Policy Action |
Description |
|---|---|
application-autoscaling:RegisterScalableTarget |
Registers an AWS or custom resource as a scalable target, allowing Application Auto Scaling to manage it. It also sets or updates the minimum and maximum capacity limits. |
application-autoscaling:DeregisterScalableTarget |
Removes a resource from being a scalable target. This action also deletes all associated scaling policies and scheduled actions for that resource. |
application-autoscaling:PutScalingPolicy |
Creates or updates a scaling policy (target tracking, step scaling, or predictive) for a registered scalable target to automate capacity adjustments. |
application-autoscaling:DeleteScalingPolicy |
Deletes a specific scaling policy. For target tracking, it also removes the CloudWatch alarms created on your behalf; for step scaling, it deletes the alarm action but not the alarm itself. |
application-autoscaling:DescribeScalableTargets |
Retrieves detailed information about one or more scalable targets in a specified service namespace, including their current capacity limits. |
application-autoscaling:DescribeScalingPolicies |
Returns information about the scaling policies for the specified service namespace and scalable targets. |
cloudwatch:PutMetricAlarm |
Creates or updates an alarm and associates it with a specific metric. In an autoscaling context, these alarms trigger the scaling policies when thresholds are breached. |
cloudwatch:DescribeAlarms |
Retrieves information about specified alarms. It is often used to verify the status or configuration of alarms used by autoscaling policies. |
cloudwatch:DeleteAlarms |
Deletes the specified alarms. This is used during cleanup to ensure that unused CloudWatch alarms are removed after a scaling policy or resource is deleted. |
"Sid": "TaggingAndMetadata"
Enables resource organization, cost tracking, and access control via metadata tags.
Policy Action |
Description |
|---|---|
sagemaker:AddTags |
Grants permission to add or overwrite one or more tags for a specified SageMaker resource (e.g., notebook instances, models, or training jobs). |
sagemaker:DeleteTags |
Grants permission to remove one or more specific tags from a SageMaker resource. |
sagemaker:ListTags |
Grants permission to view/list all tags currently associated with a specific SageMaker resource. |
"Sid": "PassRoleToSageMaker"
Required for CreateModel and CreateEndpoint â SageMaker needs an execution role to pull model artifacts from S3 and write logs. Scoped to roles matching the platformâs naming convention and conditioned to SageMaker only, preventing privilege escalation to other services.
Policy Action |
Description |
|---|---|
iam:PassRole |
Assign IAM execution roles to SageMaker models and endpoints. Scoped to |
"Sid": "ExplicitDenyDataDeletion"
Safety net to protect the SageMaker Studio environment itself. Even with full endpoint management, admins should not accidentally destroy the shared platform infrastructure. An explicit Deny ensures no other policy can override this protection.
Policy Action Denied |
Description |
|---|---|
sagemaker:DeleteDomain |
Prevents the accidental deletion of the entire SageMaker Studio environment, which includes all user settings and shared resources. |
sagemaker:DeleteUserProfile |
Prevents the deletion of individual user profiles. Deleting a profile causes the user to lose access to their associated data, notebooks, and artifacts stored in their EFS volume. |
Deployment Script:
import boto3
sagemaker = boto3.client('sagemaker')
# Create endpoint configuration
config_name = 'fraud-detection-v3-config'
sagemaker.create_endpoint_config(
EndpointConfigName=config_name,
ProductionVariants=[{
'VariantName': 'AllTraffic',
'ModelName': 'fraud-detection-v3',
'InitialInstanceCount': 2,
'InstanceType': 'ml.m5.xlarge'
}]
)
# Create endpoint
endpoint_name = 'acme-prod-fraud-detection'
sagemaker.create_endpoint(
EndpointName=endpoint_name,
EndpointConfigName=config_name,
Tags=[
{'Key': 'Environment', 'Value': 'production'},
{'Key': 'Model', 'Value': 'fraud-detection'},
{'Key': 'Version', 'Value': 'v3'}
]
)
print(f"Endpoint {endpoint_name} created successfully")
Security Note: â ïž This level should be assigned sparingly. Most users need dev-invoke or prod-invoke.
Level 4-ci: deploy-only¶
Purpose: Automated deployment of endpoints and models without destructive or traffic-shifting actions
Principal: Machine (CI/CD pipelines, deployment automation)
Typical Users:
CI/CD deployment pipelines (CodePipeline, GitHub Actions, GitLab CI)
Automated model deployment workflows
Infrastructure-as-Code automation (CloudFormation, CDK)
How It Differs from full (Level 4):
No delete actions â pipelines deploy forward, never tear down
No traffic weight shifting â canary/blue-green traffic decisions are a separate human or canary-pipeline concern
Same create/update/invoke scope â full deployment capability across all environments
Machine identity only â assigned to service roles, never to human users
What You Can Do:
â Everything in full, EXCEPT:
â Create new endpoints and endpoint configurations
â Update existing endpoints to new configurations
â Register models for deployment
â Invoke endpoints across all environments (smoke tests)
â Configure autoscaling policies
â Tag resources with deployment metadata
â Pass execution roles to SageMaker
What You Cannot Do:
â Delete endpoints, endpoint configurations, or models
â Shift traffic weights between production variants
â Delete SageMaker domains or user profiles
Example Scenario:
The CI/CD pipeline receives a merged PR that triggers a model deployment. It creates a new endpoint configuration with the updated model artifact, updates the production endpoint to use the new configuration, configures autoscaling, and runs a smoke test by invoking the endpoint. It cannot delete the old endpoint â thatâs a separate cleanup job requiring human approval.
Sample Permissions:
[
{
"Sid": "SageMakerEndpointDeployment",
"Effect": "Allow",
"Action": [
"sagemaker:CreateEndpoint",
"sagemaker:CreateEndpointConfig",
"sagemaker:UpdateEndpoint",
"sagemaker:DescribeEndpoint",
"sagemaker:DescribeEndpointConfig",
"sagemaker:ListEndpoints",
"sagemaker:ListEndpointConfigs"
],
"Resource": "*"
},
{
"Sid": "ModelRegistration",
"Effect": "Allow",
"Action": [
"sagemaker:CreateModel",
"sagemaker:CreateModelPackage",
"sagemaker:CreateModelPackageGroup",
"sagemaker:DescribeModel",
"sagemaker:DescribeModelPackage",
"sagemaker:DescribeModelPackageGroup",
"sagemaker:ListModels",
"sagemaker:ListModelPackages",
"sagemaker:ListModelPackageGroups",
"sagemaker:UpdateModelPackage"
],
"Resource": "*"
},
{
"Sid": "InferenceExecution",
"Effect": "Allow",
"Action": [
"sagemaker:InvokeEndpoint",
"sagemaker:InvokeEndpointAsync"
],
"Resource": "*"
},
{
"Sid": "AutoscalingConfiguration",
"Effect": "Allow",
"Action": [
"application-autoscaling:RegisterScalableTarget",
"application-autoscaling:PutScalingPolicy",
"application-autoscaling:DescribeScalableTargets",
"application-autoscaling:DescribeScalingPolicies",
"cloudwatch:PutMetricAlarm",
"cloudwatch:DescribeAlarms"
],
"Resource": "*"
},
{
"Sid": "TaggingAndMetadata",
"Effect": "Allow",
"Action": [
"sagemaker:AddTags",
"sagemaker:ListTags"
],
"Resource": "*"
},
{
"Sid": "PassRoleToSageMaker",
"Effect": "Allow",
"Action": [
"iam:PassRole"
],
"Resource": "arn:aws:iam::*:role/{company_prefix}-{env}-*-role-*",
"Condition": {
"StringEquals": {
"iam:PassedToService": "sagemaker.amazonaws.com"
}
}
},
{
"Sid": "DenyDestructiveActions",
"Effect": "Deny",
"Action": [
"sagemaker:DeleteEndpoint",
"sagemaker:DeleteEndpointConfig",
"sagemaker:DeleteModel",
"sagemaker:DeleteModelPackage",
"sagemaker:DeleteModelPackageGroup",
"sagemaker:UpdateEndpointWeightsAndCapacities",
"sagemaker:DeleteDomain",
"sagemaker:DeleteUserProfile"
],
"Resource": "*"
}
]
"Sid": "SageMakerEndpointDeployment"
Grants the core deployment actions: create new endpoints and configurations, update existing endpoints to new configurations, and discover/inspect all endpoints. Excludes delete actions â teardown is not a CI/CD pipeline responsibility.
Policy Action |
Description |
|---|---|
sagemaker:CreateEndpoint |
Launches a new HTTPS endpoint based on a specific configuration. |
sagemaker:CreateEndpointConfig |
Defines the deployment blueprint specifying model, instance type, and variant configuration. |
sagemaker:UpdateEndpoint |
Switches an endpoint to a new configuration for rolling deployments and model version updates. |
sagemaker:DescribeEndpoint |
Returns detailed information about an endpointâs current status and configuration. |
sagemaker:DescribeEndpointConfig |
Retrieves the settings defined in an endpoint configuration. |
sagemaker:ListEndpoints |
Lists all endpoints in the account for deployment verification. |
sagemaker:ListEndpointConfigs |
Lists all endpoint configurations for blueprint discovery. |
"Sid": "ModelRegistration"
Allows the pipeline to register model artifacts and manage model packages in the SageMaker Model Registry. Includes UpdateModelPackage for automated approval workflows.
Policy Action |
Description |
|---|---|
sagemaker:CreateModel |
Creates a model resource pointing to the container image and S3 model artifacts. |
sagemaker:CreateModelPackage |
Registers a model version in a model package group for versioned tracking. |
sagemaker:CreateModelPackageGroup |
Creates a new model package group to organize model versions. |
sagemaker:DescribeModel |
Views details of a specific model resource. |
sagemaker:DescribeModelPackage |
Views details of a specific model package version. |
sagemaker:DescribeModelPackageGroup |
Views details of a model package group. |
sagemaker:ListModels |
Lists all models in the account. |
sagemaker:ListModelPackages |
Lists model package versions within a group. |
sagemaker:ListModelPackageGroups |
Lists all model package groups. |
sagemaker:UpdateModelPackage |
Updates model package metadata, including approval status for automated promotion workflows. |
"Sid": "InferenceExecution"
Grants invoke access across all environments for post-deployment smoke tests and health checks.
Policy Action |
Description |
|---|---|
sagemaker:InvokeEndpoint |
Sends a synchronous request to an endpoint for real-time inference. |
sagemaker:InvokeEndpointAsync |
Sends an asynchronous inference request for large payloads or long processing. |
"Sid": "AutoscalingConfiguration"
Allows the pipeline to configure autoscaling after deployment. Excludes DeregisterScalableTarget and DeleteScalingPolicy â scaling teardown is a destructive action.
Policy Action |
Description |
|---|---|
application-autoscaling:RegisterScalableTarget |
Registers an endpoint variant as a scalable target with min/max capacity. |
application-autoscaling:PutScalingPolicy |
Creates or updates a scaling policy (target tracking, step, or predictive). |
application-autoscaling:DescribeScalableTargets |
Retrieves information about registered scalable targets. |
application-autoscaling:DescribeScalingPolicies |
Returns information about scaling policies for verification. |
cloudwatch:PutMetricAlarm |
Creates alarms that trigger scaling policies when thresholds are breached. |
cloudwatch:DescribeAlarms |
Retrieves alarm status for deployment verification. |
"Sid": "TaggingAndMetadata"
Allows the pipeline to tag deployed resources with deployment metadata (commit hash, pipeline run ID, version). Excludes DeleteTags â tag cleanup is not a deployment concern.
Policy Action |
Description |
|---|---|
sagemaker:AddTags |
Adds or overwrites tags on SageMaker resources for tracking and cost allocation. |
sagemaker:ListTags |
Lists tags on a resource for verification after tagging. |
"Sid": "PassRoleToSageMaker"
Required for CreateModel and CreateEndpoint â SageMaker needs an execution role to pull model artifacts from S3 and write logs. Scoped to roles matching the platformâs naming convention and conditioned to SageMaker only.
Policy Action |
Description |
|---|---|
iam:PassRole |
Assigns IAM execution roles to SageMaker models and endpoints. Scoped to |
"Sid": "DenyDestructiveActions"
Explicit deny on all destructive and traffic-shifting actions. This is the core guardrail that differentiates level4-ci from level4. CI/CD pipelines deploy forward â teardown and traffic shifting require separate authorization.
Policy Action Denied |
Description |
|---|---|
sagemaker:DeleteEndpoint |
(Denied) Prevents pipeline from removing production endpoints. |
sagemaker:DeleteEndpointConfig |
(Denied) Prevents pipeline from removing endpoint configuration blueprints. |
sagemaker:DeleteModel |
(Denied) Prevents pipeline from removing model resources. |
sagemaker:DeleteModelPackage |
(Denied) Prevents pipeline from removing model package versions. |
sagemaker:DeleteModelPackageGroup |
(Denied) Prevents pipeline from removing model package groups. |
sagemaker:UpdateEndpointWeightsAndCapacities |
(Denied) Prevents pipeline from shifting traffic between production variants. Traffic decisions should be a separate human or canary-pipeline concern. |
sagemaker:DeleteDomain |
(Denied) Prevents accidental deletion of the SageMaker Studio environment. |
sagemaker:DeleteUserProfile |
(Denied) Prevents deletion of individual user profiles and their associated data. |
Security Note: â ïž This level is designed exclusively for machine identities (service roles). Never assign to human users â humans who need full SageMaker access should use level4 (full).
Lambda Inference¶
Lambda Inference policies control access to Lambda functions that serve ML models for predictions. Lambda is ideal for lightweight, event-driven inference workloads where cold start latency is acceptable and cost optimization is a priority.
Level 1: invoke-only¶
Purpose: Call Lambda inference functions without managing them
Principal: Machine (backend services, API Gateway) or Human (developers testing)
Typical Users:
Backend API services calling model endpoints
API Gateway integrations
Event-driven architectures (S3 triggers, SQS consumers)
Developers testing inference locally
What You Can Do:
â Invoke Lambda inference functions
â View function configuration and metadata
â List available inference functions
â Check function status and last invocation
What You Cannot Do:
â Create or delete Lambda functions
â Modify function code or configuration
â Change memory, timeout, or environment variables
â Manage layers or aliases
Example Scenario:
An API Gateway route receives image classification requests from a mobile app. It invokes a Lambda function that loads a lightweight PyTorch model and returns the predicted label. The API service only needs invoke permission â it never modifies the function.
Sample Permissions:
[
{
"Sid": "LambdaDiscoveryListActions",
"Effect": "Allow",
"Action": [
"lambda:ListFunctions"
],
"Resource": "*"
},
{
"Sid": "LambdaDiscoveryActions",
"Effect": "Allow",
"Action": [
"lambda:ListAliases",
"lambda:ListTags",
"lambda:GetFunction",
"lambda:GetFunctionConfiguration",
"lambda:GetPolicy",
"lambda:GetAlias",
"lambda:GetFunctionUrlConfig",
"lambda:ListFunctionUrlConfigs",
"lambda:GetProvisionedConcurrencyConfig",
"lambda:ListProvisionedConcurrencyConfigs"
],
"Resource": "arn:aws:lambda:{region}:{account_id}:function:{company_prefix}-{env}-*"
},
{
"Sid": "LambdaInvocationActions",
"Effect": "Allow",
"Action": [
"lambda:InvokeFunction",
"lambda:InvokeFunctionUrl",
"lambda:GetFunctionEventInvokeConfig",
"lambda:ListFunctionEventInvokeConfigs",
"lambda:GetFunctionConcurrency"
],
"Resource": "arn:aws:lambda:{region}:{account_id}:function:{company_prefix}-{env}-*"
}
]
"Sid": "LambdaDiscoveryListActions"
Grants permissions to list Lambda functions, their aliases, and associated tags. This allows users to discover available inference endpoints and understand their organization without accessing sensitive configuration details.
Policy Action |
Description |
|---|---|
lambda:ListFunctions |
Retrieves a list of all Lambda functions in the region to identify inference endpoints. |
"Sid": "LambdaDiscoveryActions"
Provides permissions to view detailed information about Lambda functions, including their configuration, access policies, aliases, URL configurations, and concurrency settings. This allows users to understand the capabilities and status of inference functions without modifying them.
Policy Action |
Description |
|---|---|
lambda:ListAliases |
Lists all aliases for a specific function to find different deployment versions. |
lambda:ListTags |
Lists tags assigned to the function for resource filtering and organization. |
lambda:GetFunction |
Returns the configuration and a pre-signed URL to download the deployment package. |
lambda:GetFunctionConfiguration |
Provides specific metadata like runtime, handler, and environment variables. |
lambda:GetPolicy |
Retrieves the resource-based policy to verify access permissions. |
lambda:GetAlias |
Retrieves information about a specific function alias (e.g., âprodâ or âstagingâ). |
lambda:GetFunctionUrlConfig |
Returns the URL configuration for functions used as direct HTTP(S) endpoints. |
lambda:ListFunctionUrlConfigs |
Lists all URL configurations associated with a function. |
lambda:GetProvisionedConcurrencyConfig |
Retrieve the status and details of the Provisioned Concurrency setup for a specific function version or alias. |
lambda:ListProvisionedConcurrencyConfigs |
Lists all provisioned concurrency configurations for a function to assess scaling readiness. |
"Sid": "LambdaInvocationActions"
Grants permissions to execute Lambda functions and manage asynchronous execution flows. This allows users to invoke inference functions for predictions while still preventing any modifications to the function code or configuration.
Policy Action |
Description |
|---|---|
lambda:InvokeFunction |
The primary action for synchronous or asynchronous execution of the inference code. |
lambda:InvokeFunctionUrl |
Enables execution via the built-in Lambda HTTP(S) endpoint. |
lambda:GetFunctionEventInvokeConfig |
Retrieves configuration for asynchronous delivery, such as destination and retry attempts. |
lambda:ListFunctionEventInvokeConfigs |
Lists all asynchronous invocation configurations for the function. |
lambda:GetFunctionConcurrency |
Allows checking if the function has enough reserved capacity to handle the expected inference load. |
Level 2: deploy-manage¶
Purpose: Deploy and configure Lambda inference functions
Principal: Human (ML engineers, DevOps) or Machine (CI/CD pipelines)
Typical Users:
ML engineers packaging models into Lambda functions
DevOps engineers configuring memory, timeout, and concurrency
CI/CD pipelines deploying new model versions
Data scientists publishing lightweight models
How It Differs from invoke-only (Level 1):
Adds deployment â create, update, and publish function versions
Adds configuration â modify memory, timeout, environment variables, layers
Adds alias management â create aliases for blue/green and canary deployments
Still no delete â function removal requires Level 3
What You Can Do:
â Everything in invoke-only, PLUS:
â Create new Lambda inference functions
â Update function code with new model versions
â Configure memory, timeout, and concurrency settings
â Manage function aliases for traffic shifting
â Add and update Lambda layers (model dependencies)
â Set environment variables (model paths, feature flags)
What You Cannot Do:
â Delete Lambda functions
â Modify IAM execution roles
â Change VPC or security group settings
Example Scenario:
An ML engineer has retrained the image classification model and needs to deploy the new version. They update the Lambda function code, publish a new version, and shift 10% of traffic to the new version via a weighted alias â all without touching the production alias until validation passes.
Sample Permissions:
[
{
"Sid": "LambdaGlobalDiscovery",
"Effect": "Allow",
"Action": [
"lambda:GetAccountSettings",
"lambda:ListFunctions",
"lambda:ListLayers",
"lambda:ListLayerVersions",
"lambda:ListCodeSigningConfigs",
"lambda:ListEventSourceMappings"
],
"Resource": "*"
},
{
"Sid": "LambdaFunctionDiscovery",
"Effect": "Allow",
"Action": [
"lambda:GetAlias",
"lambda:GetFunction",
"lambda:GetFunctionCodeSigningConfig",
"lambda:GetFunctionConcurrency",
"lambda:GetFunctionConfiguration",
"lambda:GetFunctionEventInvokeConfig",
"lambda:GetFunctionUrlConfig",
"lambda:GetPolicy",
"lambda:GetProvisionedConcurrencyConfig",
"lambda:GetRuntimeManagementConfig",
"lambda:ListAliases",
"lambda:ListFunctionEventInvokeConfigs",
"lambda:ListFunctionUrlConfigs",
"lambda:ListProvisionedConcurrencyConfigs",
"lambda:ListTags",
"lambda:ListVersionsByFunction"
],
"Resource": "arn:aws:lambda:{region}:{account_id}:function:{company_prefix}-{env}-*"
},
{
"Sid": "LambdaLayerDiscovery",
"Effect": "Allow",
"Action": [
"lambda:GetLayerVersion",
"lambda:GetLayerVersionPolicy"
],
"Resource": "arn:aws:lambda:{region}:{account_id}:layer:{company_prefix}-{env}-*"
},
{
"Sid": "LambdaInvocation",
"Effect": "Allow",
"Action": [
"lambda:InvokeFunction",
"lambda:InvokeFunctionUrl"
],
"Resource": "arn:aws:lambda:{region}:{account_id}:function:{company_prefix}-{env}-*"
},
{
"Sid": "LambdaDeploymentAndConfiguration",
"Effect": "Allow",
"Action": [
"lambda:CreateFunction",
"lambda:UpdateFunctionCode",
"lambda:UpdateFunctionConfiguration",
"lambda:PublishVersion",
"lambda:CreateAlias",
"lambda:UpdateAlias",
"lambda:PutFunctionConcurrency",
"lambda:PutFunctionEventInvokeConfig",
"lambda:PutProvisionedConcurrencyConfig",
"lambda:CreateFunctionUrlConfig",
"lambda:UpdateFunctionUrlConfig",
"lambda:TagResource"
],
"Resource": "arn:aws:lambda:{region}:{account_id}:function:{company_prefix}-{env}-*"
},
{
"Sid": "LambdaLayerManagement",
"Effect": "Allow",
"Action": [
"lambda:PublishLayerVersion"
],
"Resource": "arn:aws:lambda:{region}:{account_id}:layer:{company_prefix}-{env}-*"
},
{
"Sid": "PassRoleToLambda",
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::{account_id}:role/{company_prefix}-{env}-*-role-*",
"Condition": {
"StringEquals": {
"iam:PassedToService": "lambda.amazonaws.com"
}
}
},
{
"Sid": "DenyDeleteAndPermissionChanges",
"Effect": "Deny",
"Action": [
"lambda:DeleteFunction",
"lambda:DeleteAlias",
"lambda:DeleteFunctionUrlConfig",
"lambda:DeleteFunctionConcurrency",
"lambda:DeleteFunctionEventInvokeConfig",
"lambda:DeleteProvisionedConcurrencyConfig",
"lambda:DeleteLayerVersion",
"lambda:AddPermission",
"lambda:RemovePermission"
],
"Resource": "*"
}
]
"Sid": "LambdaGlobalDiscovery"
Grants account-level and cross-function read access for actions that do not support resource-level scoping. These actions require Resource: * per the AWS Service Authorization Reference.
Policy Action |
Description |
|---|---|
lambda:GetAccountSettings |
Returns account-level limits and usage such as concurrent execution quotas. |
lambda:ListFunctions |
Retrieves a list of all Lambda functions in the region. |
lambda:ListLayers |
Lists all Lambda layers available in the region. |
lambda:ListLayerVersions |
Lists published versions of a specific layer. |
lambda:ListCodeSigningConfigs |
Lists code signing configurations in the account. |
lambda:ListEventSourceMappings |
Lists event source mappings in the account. |
"Sid": "LambdaFunctionDiscovery"
Grants read-only access to function-level metadata, configuration, aliases, concurrency settings, and URL configurations. All actions are scoped to tenant-prefixed functions.
Policy Action |
Description |
|---|---|
lambda:GetAlias |
Returns details about a specific function alias. |
lambda:GetFunction |
Returns the function configuration and a pre-signed URL for the deployment package. |
lambda:GetFunctionCodeSigningConfig |
Returns the code signing config attached to a function. |
lambda:GetFunctionConcurrency |
Returns the reserved concurrency configuration for a function. |
lambda:GetFunctionConfiguration |
Returns version-specific settings such as runtime, handler, memory, and timeout. |
lambda:GetFunctionEventInvokeConfig |
Returns the asynchronous invocation configuration (retries, destinations). |
lambda:GetFunctionUrlConfig |
Returns the function URL configuration for direct HTTP(S) access. |
lambda:GetPolicy |
Returns the resource-based policy attached to the function. |
lambda:GetProvisionedConcurrencyConfig |
Returns the provisioned concurrency configuration for an alias or version. |
lambda:GetRuntimeManagementConfig |
Returns the runtime management configuration (auto or manual updates). |
lambda:ListAliases |
Lists all aliases for a specific function. |
lambda:ListFunctionEventInvokeConfigs |
Lists asynchronous invocation configurations for a function. |
lambda:ListFunctionUrlConfigs |
Lists URL configurations associated with a function. |
lambda:ListProvisionedConcurrencyConfigs |
Lists provisioned concurrency configurations for a function. |
lambda:ListTags |
Lists tags assigned to the function. |
lambda:ListVersionsByFunction |
Lists published versions of a function. |
"Sid": "LambdaLayerDiscovery"
Grants read-only access to layer version details and policies. Layer actions require a layer ARN, not a function ARN, so they are scoped separately.
Policy Action |
Description |
|---|---|
lambda:GetLayerVersion |
Returns details about a specific layer version, including the download URL. |
lambda:GetLayerVersionPolicy |
Returns the resource-based policy for a layer version. |
"Sid": "LambdaInvocation"
Grants permission to execute Lambda inference functions via direct invocation or function URLs. Scoped to tenant-prefixed functions.
Policy Action |
Description |
|---|---|
lambda:InvokeFunction |
Sends a synchronous or asynchronous request to execute the function. |
lambda:InvokeFunctionUrl |
Invokes the function via its built-in HTTP(S) endpoint. |
"Sid": "LambdaDeploymentAndConfiguration"
Grants permissions to create functions, deploy new code versions, configure runtime settings, manage aliases for traffic shifting, and set concurrency. Does not include delete actions.
Policy Action |
Description |
|---|---|
lambda:CreateFunction |
Creates a new Lambda function with the specified code and configuration. |
lambda:UpdateFunctionCode |
Deploys new code to an existing function (e.g., updated model artifact). |
lambda:UpdateFunctionConfiguration |
Modifies function settings such as memory, timeout, and environment variables. |
lambda:PublishVersion |
Creates an immutable snapshot of the current function code and configuration. |
lambda:CreateAlias |
Creates a named alias pointing to a function version for traffic routing. |
lambda:UpdateAlias |
Updates an alias to point to a different version or adjust traffic weights. |
lambda:PutFunctionConcurrency |
Sets reserved concurrency to guarantee execution capacity. |
lambda:PutFunctionEventInvokeConfig |
Configures asynchronous invocation settings (retries, destinations). |
lambda:PutProvisionedConcurrencyConfig |
Allocates provisioned concurrency to reduce cold starts. |
lambda:CreateFunctionUrlConfig |
Creates an HTTP(S) endpoint for direct function invocation. |
lambda:UpdateFunctionUrlConfig |
Modifies the function URL configuration. |
lambda:TagResource |
Adds or updates tags on the function for organization and cost tracking. |
"Sid": "LambdaLayerManagement"
Grants permission to publish new layer versions containing model dependencies, shared libraries, or custom runtimes. Layer actions require a layer ARN, scoped separately from functions.
Policy Action |
Description |
|---|---|
lambda:PublishLayerVersion |
Publishes a new version of a layer with updated dependencies or libraries. |
"Sid": "PassRoleToLambda"
Allows passing an IAM execution role to Lambda when creating or updating functions. Scoped to tenant-prefixed roles and conditioned to the Lambda service only.
Policy Action |
Description |
|---|---|
iam:PassRole |
Passes an IAM role to Lambda as the functionâs execution role. |
"Sid": "DenyDeleteAndPermissionChanges"
Explicitly prevents deletion of functions, aliases, layers, concurrency configs, and URL configs. Also blocks changes to resource-based policies (AddPermission/RemovePermission) which control cross-account access. These destructive and permission-escalation actions are reserved for Level 3 (full).
Policy Action Denied |
Description |
|---|---|
lambda:DeleteFunction |
(Denied) Deletes a Lambda function and all its versions. |
lambda:DeleteAlias |
(Denied) Deletes a function alias. |
lambda:DeleteFunctionUrlConfig |
(Denied) Removes the function URL endpoint. |
lambda:DeleteFunctionConcurrency |
(Denied) Removes reserved concurrency from a function. |
lambda:DeleteFunctionEventInvokeConfig |
(Denied) Removes asynchronous invocation configuration. |
lambda:DeleteProvisionedConcurrencyConfig |
(Denied) Removes provisioned concurrency allocation. |
lambda:DeleteLayerVersion |
(Denied) Deletes a published layer version. |
lambda:AddPermission |
(Denied) Adds a statement to the functionâs resource-based policy. |
lambda:RemovePermission |
(Denied) Removes a statement from the functionâs resource-based policy. |
Level 3: full¶
Purpose: Complete Lambda inference function lifecycle management
Principal: Human (platform admins, MLOps leads)
Typical Users:
Platform administrators
MLOps team leads
Infrastructure engineers
How It Differs from deploy-manage (Level 2):
Adds delete lifecycle â can remove functions, aliases, layers, versions, configs
Adds resource policy management â AddPermission/RemovePermission for cross-account invocation control
Adds code signing enforcement â manage code signing configurations
Adds event source mapping management â create/update/delete for event-driven inference
Account-wide scope â no function name restrictions (lambda:*)
What You Can Do:
â Everything in deploy-manage, PLUS:
â Delete functions, aliases, layers, versions, and configs
â Manage resource-based policies (cross-account invocation control)
â Manage code signing configurations
â Create, update, and delete event source mappings
â Full account-wide access â not restricted to naming conventions
What You Cannot Do:
â Nothing â full Lambda inference management
Example Scenario:
The platform team is decommissioning a retired product line. They need to delete the associated Lambda inference functions, remove their event source mappings, clean up resource-based policies that granted cross-account access, and delete the Lambda layers that were dedicated to those functions.
Sample Permissions:
[
{
"Sid": "LambdaFullAccess",
"Effect": "Allow",
"Action": "lambda:*",
"Resource": "*"
},
{
"Sid": "PassRoleToLambda",
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::{account_id}:role/{company_prefix}-{env}-*-role-*",
"Condition": {
"StringEquals": {
"iam:PassedToService": "lambda.amazonaws.com"
}
}
}
]
Why
lambda:*instead of explicit action enumeration? Lambdaâs API surface grows frequently as AWS adds new features. Unlike SageMaker â where new actions can spin up expensive training jobs or endpoints â new Lambda actions have low cost and blast-radius impact. Maintaining an explicit list of 50+ actions creates maintenance debt that leads to stale policies and broken deployments when AWS adds actions. TheResource: *scope is acceptable here because Level 3 principals are platform administrators who need to govern the entire account, including functions that may not follow naming conventions.
"Sid": "LambdaFullAccess"
Grants full Lambda access across all resource types in the account â functions, layers, event source mappings, code signing configs, and any future Lambda resource types. This is the administrative level for platform teams who manage the complete Lambda inference lifecycle.
Policy Action |
Description |
|---|---|
lambda:* |
All Lambda actions â create, read, update, delete, invoke, and manage across all Lambda resource types. |
"Sid": "PassRoleToLambda"
Allows passing an IAM execution role to Lambda when creating or updating functions. Even at full Lambda access, PassRole remains scoped to tenant-prefixed roles to prevent privilege escalation via arbitrary role attachment.
Policy Action |
Description |
|---|---|
iam:PassRole |
Passes an IAM role to Lambda as the functionâs execution role. Scoped to |
TODO: VPC Governance Subsection Document two config-driven VPC condition key patterns that apply to both Level 2 and Level 3:
Enforce specific VPC â Deny
UpdateFunctionConfigurationwhenlambda:VpcIdsdoesnât match config valueDeny all VPC â Deny
UpdateFunctionConfigurationwhenlambda:VpcIdsis present Config schema:lambda_inference.vpc_policy(âenforceâ | âdenyâ | ânoneâ)
Bedrock Inference¶
Bedrock Inference policies control access to foundation models (FMs) for generative AI workloads. Unlike SageMaker where you manage your own endpoints, Bedrock is a fully managed service â the policy focus is on which models can be invoked, where inference runs (cross-region), and how much throughput is provisioned.
Level 1: invoke-only¶
Purpose: Call foundation models for predictions without managing model access or throughput
Principal: Machine (backend services, chatbots) or Human (developers, analysts)
Typical Users:
Backend services integrating generative AI
Customer-facing chatbots and assistants
Developers prototyping with foundation models
Analysts using text summarization or classification
What You Can Do:
â Invoke allowed foundation models
â Use the Converse API for chat-based interactions
â List available foundation models
â View model details and capabilities
What You Cannot Do:
â Enable or disable model access
â Create or manage provisioned throughput
â Configure cross-region inference
â Manage custom models or fine-tuning jobs
â Create or modify guardrails
Example Scenario:
A customer support chatbot needs to call Claude for generating responses. The service role can invoke the model but cannot change which models are available or provision dedicated throughput.
Sample Permissions:
[
{
"Sid": "BedrockDiscovery",
"Effect": "Allow",
"Action": [
"bedrock:ListFoundationModels",
"bedrock:GetFoundationModel"
],
"Resource": "*"
},
{
"Sid": "BedrockStandardInference",
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": "*"
},
{
"Sid": "BedrockConverseInference",
"Effect": "Allow",
"Action": [
"bedrock:Converse",
"bedrock:ConverseStream"
],
"Resource": "*"
}
]
"Sid": "BedrockDiscovery"
Allows users to discover available foundation models and view their specific capabilities and details.
Policy Action |
Description |
|---|---|
bedrock:ListFoundationModels |
Lists the foundation models available in Amazon Bedrock, which is necessary to identify which models can be invoked. |
bedrock:GetFoundationModel |
Retrieves detailed information about a specific foundation model, such as input/output modalities and customization support. |
"Sid": "BedrockStandardInference"
Enables the core ability to send prompts and receive responses from foundation models, including streaming and chat-specific APIs.
Policy Action |
Description |
|---|---|
bedrock:InvokeModel |
Sends a prompt to a specified model and receives the entire response in a single payload. |
bedrock:InvokeModelWithResponseStream |
Sends a prompt to a model and receives the response as a series of tokens (streaming), ideal for real-time applications. |
"Sid": "BedrockConverseInference"
Enables multi-turn chat interactions via the Converse API. Kept as a separate Sid from BedrockStandardInference for three reasons:
Resource-level control â allows scoping Converse to specific foundation models or inference profiles independently from InvokeModel (e.g., for cost tracking)
Streaming restrictions â if compliance requires disabling streaming (which can bypass certain content inspection or logging), ConverseStream can be split into its own Sid
Auditability â separate Sids make it easier to identify which statement granted a specific permission in IAM policy evaluation
Note:
bedrock:Converseandbedrock:ConverseStreamare functional Bedrock API actions but were not listed in the AWS IAM Service Authorization Reference at time of writing. If they authorize viaInvokeModelunder the hood, having them listed explicitly does not affect policy behavior.
Policy Action |
Description |
|---|---|
bedrock:Converse |
Provides a consistent API for multi-turn chat conversations, managing message history and formatting for supported models. |
bedrock:ConverseStream |
Allows for multi-turn chat conversations with the benefit of streaming responses for lower perceived latency. |
Level 2: model-manage¶
Purpose: Manage model access, guardrails, and inference configurations
Principal: Human (ML engineers, AI/ML team leads)
Typical Users:
ML engineers configuring model access for teams
AI/ML team leads managing guardrails and content filters
DevOps engineers setting up cross-region inference
Data scientists managing custom model imports
How It Differs from invoke-only (Level 1):
Adds model access management â enable/disable foundation models for the account
Adds guardrail management â create and configure content filters and safety controls
Adds cross-region configuration â control where inference requests are routed
Still no throughput provisioning or deletion â cost-impacting decisions require Level 3
What You Can Do:
â Everything in invoke-only, PLUS:
â Enable and disable foundation model access
â Create and configure guardrails (content filters, topic blocks)
â Manage custom model imports
â Configure cross-region inference profiles
â View usage metrics and invocation logs
What You Cannot Do:
â Create or delete provisioned throughput (cost-impacting)
â Delete guardrails
â Manage account-level Bedrock settings
Example Scenario:
The AI/ML team lead needs to enable a new Anthropic model for the development team, create a guardrail that blocks PII in model responses, and configure cross-region inference so requests can fail over to us-east-1 if us-west-2 is at capacity.
[
{
"Sid": "BedrockStandardInference",
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": "*"
},
{
"Sid": "BedrockConverseInference",
"Effect": "Allow",
"Action": [
"bedrock:Converse",
"bedrock:ConverseStream"
],
"Resource": "*"
},
{
"Sid": "ModelAccessManagement",
"Effect": "Allow",
"Action": [
"bedrock:GetFoundationModel",
"bedrock:ListFoundationModels",
"bedrock:PutModelInvocationLoggingConfiguration",
"bedrock:GetModelInvocationLoggingConfiguration",
"bedrock:ListModelInvocationJobs",
"bedrock:PutFoundationModelEntitlement",
"bedrock:PutUseCaseForModelAccess",
"bedrock:ListFoundationModelAgreementOffers",
"bedrock:CreateFoundationModelAgreement",
"bedrock:GetFoundationModelAvailability",
"bedrock:DeleteFoundationModelAgreement"
],
"Resource": "*"
},
{
"Sid": "GuardrailManagement",
"Effect": "Allow",
"Action": [
"bedrock:CreateGuardrail",
"bedrock:UpdateGuardrail",
"bedrock:CreateGuardrailVersion",
"bedrock:GetGuardrail",
"bedrock:ListGuardrails"
],
"Resource": "*"
},
{
"Sid": "CustomModelImportManagement",
"Effect": "Allow",
"Action": [
"bedrock:ImportModel",
"bedrock:GetCustomModel",
"bedrock:ListCustomModels",
"bedrock:CreateModelImportJob",
"bedrock:GetModelImportJob",
"bedrock:ListModelImportJobs",
"bedrock:StopModelImportJob"
],
"Resource": "*"
},
{
"Sid": "CrossRegionInferenceManagement",
"Effect": "Allow",
"Action": [
"bedrock:CreateInferenceProfile",
"bedrock:GetInferenceProfile",
"bedrock:ListInferenceProfiles",
"bedrock:UpdateInferenceProfile"
],
"Resource": "*"
},
{
"Sid": "ObservabilityAndMetrics",
"Effect": "Allow",
"Action": [
"cloudwatch:GetMetricData",
"cloudwatch:ListMetrics",
"logs:DescribeLogGroups",
"logs:GetLogEvents"
],
"Resource": "*"
},
{
"Sid": "DenyBedrockDeleteOperations",
"Effect": "Deny",
"Action": [
"bedrock:DeleteCustomModel",
"bedrock:DeleteModelInvocationLoggingConfiguration",
"bedrock:DeleteProvisionedModelThroughput",
"bedrock:DeleteModelImportJob",
"bedrock:DeleteCustomModelDeployment",
"bedrock:DeleteInferenceProfile",
"bedrock:DeletePromptRouter",
"bedrock:DeleteGuardrail",
"bedrock:DeleteKnowledgeBase",
"bedrock:DeleteAgent"
],
"Resource": "*"
}
]
"Sid": "BedrockStandardInference"
Enables the core ability to send prompts and receive responses from foundation models, including streaming and chat-specific APIs.
Policy Action |
Description |
|---|---|
bedrock:InvokeModel |
Sends a prompt to a specified model and receives the entire response in a single payload. |
bedrock:InvokeModelWithResponseStream |
Sends a prompt to a model and receives the response as a series of tokens (streaming), ideal for real-time applications. |
"Sid": "BedrockConverseInference"
Multi-turn chat interactions via the Converse API. Kept as a separate Sid for independent resource scoping, streaming control, and auditability (see Level 1 rationale).
Policy Action |
Description |
|---|---|
bedrock:Converse |
Provides a consistent API for multi-turn chat conversations, managing message history and formatting for supported models. |
bedrock:ConverseStream |
Allows for multi-turn chat conversations with the benefit of streaming responses for lower perceived latency. |
"Sid": "ModelAccessManagement"
Permissions to enable, disable, and manage entitlements for foundation models within the account.
Policy Action |
Description |
|---|---|
bedrock:GetFoundationModel |
Retrieves detailed information and properties about a specific Amazon Bedrock foundation model. |
bedrock:ListFoundationModels |
Lists all foundation models available in Amazon Bedrock for the current region. |
bedrock:PutModelInvocationLoggingConfiguration |
Configures where to store model invocation logs, such as S3 buckets or CloudWatch Logs. |
bedrock:GetModelInvocationLoggingConfiguration |
Retrieves the current configuration for model invocation logging. |
bedrock:ListModelInvocationJobs |
Lists asynchronous model invocation jobs to track batch processing status. |
bedrock:PutFoundationModelEntitlement |
Submits a request for foundation model entitlement. Largely automated for most models but still required for certain provider-specific access flows. |
bedrock:PutUseCaseForModelAccess |
Submits the required provider use-case form for first-time model access, such as the Anthropic use-case disclosure. One-time per account or organization. |
bedrock:ListFoundationModelAgreementOffers |
Grants permission to view available agreement offers for foundation models. |
bedrock:CreateFoundationModelAgreement |
Grants permission to officially accept an offer and create a new agreement for a foundation model. |
bedrock:GetFoundationModelAvailability |
Grants permission to check if a specific foundation model is available for use in your account or region. |
bedrock:DeleteFoundationModelAgreement |
Grants permission to terminate or delete an existing foundation model agreement. |
"Sid": "GuardrailManagement"
Permissions to create and configure safety controls, content filters, and PII masking without deletion rights.
Policy Action |
Description |
|---|---|
bedrock:CreateGuardrail |
Creates a new guardrail to filter sensitive content or block specific topics in model responses. |
bedrock:UpdateGuardrail |
Modifies existing guardrail configurations, such as updating filter strengths or blocked words. |
bedrock:CreateGuardrailVersion |
Creates a snapshot version of a guardrail for consistent deployment across environments. |
bedrock:GetGuardrail |
Retrieves the detailed configuration of a specific guardrail. |
bedrock:ListGuardrails |
Lists all guardrails defined in the account. |
"Sid": "CustomModelImportManagement"
Manages custom model imports and cross-region routing profiles for high availability.
Policy Action |
Description |
|---|---|
bedrock:ImportModel |
Initiates the process of importing a custom model into Amazon Bedrock. |
bedrock:GetCustomModel |
Retrieves details about a custom or imported model. |
bedrock:ListCustomModels |
Lists all custom models available in the account. |
bedrock:CreateModelImportJob |
Starts the process of importing a custom model into Bedrock. |
bedrock:GetModelImportJob |
Retrieves detailed information and the current status of a specific import job. |
bedrock:ListModelImportJobs |
Returns a list of all model import jobs submitted. |
bedrock:StopModelImportJob |
Immediately cancels a model import job that is currently in progress. |
"Sid": "CrossRegionInferenceManagement"
Enables managing and tracking model usage across one or multiple AWS regions.
Policy Action |
Description |
|---|---|
bedrock:CreateInferenceProfile |
Sets up cross-region inference profiles to manage request routing and failover. |
bedrock:GetInferenceProfile |
Retrieves details about a specific inference profile. |
bedrock:ListInferenceProfiles |
Lists available inference profiles for the account. |
bedrock:UpdateInferenceProfile |
Grants permission to modify the settings of an existing application inference profile, such as updating its description or configuration. |
"Sid": "ObservabilityAndMetrics"
Grants permissions to access CloudWatch metrics and logs related to Bedrock model invocations for monitoring and troubleshooting.
Policy Action |
Description |
|---|---|
cloudwatch:GetMetricData |
Grants permission to retrieve batch amounts of CloudWatch metric data and perform metric math on the retrieved data. |
cloudwatch:ListMetrics |
Grants permission to retrieve a list of valid metrics stored for the AWS account owner, which can then be used to get statistical data. |
logs:DescribeLogGroups |
Grants permission to return all log groups associated with the requesting AWS account, including data sources that ingest into them. |
logs:GetLogEvents |
Grants permission to retrieve individual log events from a specific log stream, with the ability to filter results by time range. |
"Sid": "DenyBedrockDeleteOperations"
Denies delete operations within Amazon Bedrock.
Policy Action |
Description |
|---|---|
bedrock:DeleteCustomModel |
Deletes a custom model that was previously created through model customization (fine-tuning). |
bedrock:DeleteModelInvocationLoggingConfiguration |
Removes the configuration that logs model inputs and outputs to S3 or CloudWatch, which is often used for auditing. |
bedrock:DeleteProvisionedModelThroughput |
Deletes a Provisioned Throughput reservation; note that this typically cannot be done before a commitment term ends. |
bedrock:DeleteModelImportJob |
Deletes a record or job associated with importing a customized model from other environments like Amazon SageMaker. |
bedrock:DeleteCustomModelDeployment |
Stops and removes a deployed custom model, making its ARN unavailable for further inference. |
bedrock:DeleteInferenceProfile |
Deletes an inference profile, which is used to manage and track model invocation across different regions or configurations. |
bedrock:DeletePromptRouter |
Removes a prompt router used to direct incoming requests to specific models or versions. |
bedrock:DeleteGuardrail |
Deletes a Bedrock Guardrail, which provides content filtering and safety controls for generative AI applications. |
bedrock:DeleteKnowledgeBase |
Deletes a Knowledge Base resource used for Retrieval-Augmented Generation (RAG) workflows. |
bedrock:DeleteAgent |
Deletes an Amazon Bedrock Agent that automates tasks by interacting with foundation models and other AWS services. |
Level 3: full¶
Purpose: Complete Bedrock platform management including cost-impacting operations
Principal: Human (platform admins, cloud architects)
Typical Users:
Platform administrators
Cloud architects
FinOps engineers (provisioned throughput decisions)
Security team (account-level controls)
How It Differs from model-manage (Level 2):
Adds provisioned throughput â create, modify, and delete dedicated model capacity (significant cost)
Adds delete operations â can remove guardrails, custom models, inference profiles, agents, knowledge bases
Adds account-level settings â manage Bedrock service-level configurations
Full fine-tuning control â create and manage model customization jobs
Full platform governance â agents, knowledge bases, evaluations, prompt routers, batch inference
What You Can Do:
â Everything in model-manage, PLUS:
â Create and delete provisioned throughput (dedicated capacity)
â Delete guardrails, custom models, inference profiles, agents, knowledge bases
â Manage model fine-tuning and customization jobs
â Configure account-level Bedrock settings
â Manage agents, knowledge bases, evaluations, and prompt routers
What You Cannot Do:
â Nothing â full Bedrock management
Example Scenario:
The platform team needs to provision dedicated throughput for the production chatbot ahead of a product launch, clean up unused guardrails from a decommissioned project, and configure account-level logging for all Bedrock invocations.
Sample Permissions:
[
{
"Sid": "BedrockFullAccess",
"Effect": "Allow",
"Action": "bedrock:*",
"Resource": "*"
},
{
"Sid": "BedrockFullObservability",
"Effect": "Allow",
"Action": [
"cloudwatch:GetMetricData",
"cloudwatch:ListMetrics",
"logs:DescribeLogGroups",
"logs:GetLogEvents"
],
"Resource": "*"
},
{
"Sid": "PassRoleToBedrock",
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::{account_id}:role/{company_prefix}-{env}-*-role-*",
"Condition": {
"StringEquals": {
"iam:PassedToService": "bedrock.amazonaws.com"
}
}
}
]
Why
bedrock:*instead of explicit action enumeration? Bedrockâs API surface is growing rapidly as AWS adds agents, knowledge bases, evaluations, prompt routers, batch inference, marketplace models, and more. Unlike SageMaker â where new actions can spin up expensive training jobs or endpoints â Bedrockâs cost-impacting actions are limited to provisioned throughput, which admins explicitly manage. Maintaining an explicit list of 80+ actions creates maintenance debt that leads to stale policies and broken deployments when AWS adds actions. TheResource: *scope is acceptable here because Level 3 principals are platform administrators who need to govern the entire account.
"Sid": "BedrockFullAccess"
Grants full Bedrock access across all resource types in the account â foundation models, custom models, guardrails, inference profiles, provisioned throughput, agents, knowledge bases, evaluations, and any future Bedrock resource types. This is the administrative level for platform teams who manage the complete Bedrock lifecycle.
Policy Action |
Description |
|---|---|
bedrock:* |
All Bedrock actions â invoke, model access, guardrails, provisioned throughput, custom models, agents, knowledge bases, evaluations, prompt routers, and all future actions. |
"Sid": "BedrockFullObservability"
Grants permissions to access CloudWatch metrics and logs related to Bedrock model invocations for monitoring and troubleshooting. These are non-Bedrock namespace actions that bedrock:* does not cover.
Policy Action |
Description |
|---|---|
cloudwatch:GetMetricData |
Grants permission to retrieve batch amounts of CloudWatch metric data and perform metric math on the retrieved data. |
cloudwatch:ListMetrics |
Grants permission to retrieve a list of valid metrics stored for the AWS account owner, which can then be used to get statistical data. |
logs:DescribeLogGroups |
Grants permission to return all log groups associated with the requesting AWS account, including data sources that ingest into them. |
logs:GetLogEvents |
Grants permission to retrieve individual log events from a specific log stream, with the ability to filter results by time range. |
"Sid": "PassRoleToBedrock"
Allows passing an IAM service role to Bedrock for operations that require it, such as model customization (fine-tuning) jobs and model import jobs. Even at full Bedrock access, PassRole remains scoped to tenant-prefixed roles to prevent privilege escalation via arbitrary role attachment.
Policy Action |
Description |
|---|---|
iam:PassRole |
Passes an IAM role to Bedrock as the service role for customization and import jobs. Scoped to |
TODO: Config-Driven Bedrock Model Scoping
Add an
inferencesection to client configs that controls Resource ARN generation for Bedrock invoke actions.Config schema:
inference: bedrock: allowed_models: # list of foundation model ID patterns - "anthropic.claude-3-sonnet-*" - "anthropic.claude-3-haiku-*" - "amazon.titan-embed-text-v1" allowed_regions: # for cross-region inference profiles - "us-west-2" - "us-east-1"Template generator behavior:
allowed_modelspresent â Resource becomes list ofarn:aws:bedrock:{region}::foundation-model/<model-id>ARNs
allowed_modelsabsent or["*"]â Resource stays"*"Applies to Level 1 and Level 2 invoke Sids; Level 3 (full) always uses
"*"Per-tier defaults:
Startup: omit or
["*"](no restriction, encourage exploration)Medium: explicit model list (cost control)
Enterprise: explicit model list (compliance requirement)
Code changes needed:
Add
inferencekey to validation schemas (validation-schema-startup.yaml,medium,enterprise)Update template generator to read
inference.bedrock.allowed_modelsand build Resource ARN listUpdate client configs with
inferencesection (all 3 tiers)Add unit tests for ARN generation with and without
allowed_modelsAlso include in
inferencesection (for consistency with existing designs):inference: sagemaker: endpoint_prefix: "{company_prefix}-{env}" lambda: vpc_policy: "none" # "enforce" | "deny" | "none" bedrock: allowed_models: ["*"] allowed_regions: ["us-west-2"]
KMS Policies¶
KMS (Key Management Service) policies control read-only access to encryption keys used across the MLOps platform. KMS keys protect S3 objects, SageMaker model artifacts, and other sensitive data.
Level 1: read-only¶
Purpose: Verify encryption settings and key configurations without the ability to encrypt, decrypt, or modify keys.
Typical Users:
Operations support (verify encryption compliance)
Security auditors (review key policies and rotation status)
Compliance reviewers (confirm encryption standards)
What You Can Do:
â View key metadata, policies, and rotation status
â List all KMS keys and aliases in the account
â View resource tags on keys
â Retrieve public keys (for asymmetric keys)
What You Cannot Do:
â No Encrypt/Decrypt: Cannot use keys to encrypt or decrypt data
â No Key Management: Cannot create, disable, delete, or schedule deletion of keys
â No Policy Changes: Cannot modify key policies or grants
â No Key Rotation Changes: Cannot enable or disable automatic key rotation
Sample Permissions:
[
{
"Sid": "ReadOnlyAccessForAllKMSKeysInAccount",
"Effect": "Allow",
"Action": [
"kms:GetPublicKey",
"kms:GetKeyRotationStatus",
"kms:GetKeyPolicy",
"kms:DescribeKey",
"kms:ListKeyPolicies",
"kms:ListResourceTags",
"tag:GetResources"
],
"Resource": "arn:aws:kms:*:{account_id}:key/*"
},
{
"Sid": "ReadOnlyAccessForOperationsWithNoKMSKey",
"Effect": "Allow",
"Action": [
"kms:ListKeys",
"kms:ListAliases"
],
"Resource": "*"
}
]
"Sid": "ReadOnlyAccessForAllKMSKeysInAccount"
Grants read-only access to individual KMS key metadata, scoped to the account.
Policy Action |
Description |
|---|---|
kms:GetPublicKey |
Retrieve the public key of an asymmetric KMS key. |
kms:GetKeyRotationStatus |
Check whether automatic key rotation is enabled for a key. |
kms:GetKeyPolicy |
View the resource-based policy attached to a KMS key. |
kms:DescribeKey |
Retrieve metadata about a KMS key (creation date, state, key spec). |
kms:ListKeyPolicies |
List the names of key policies attached to a key. |
kms:ListResourceTags |
View tags associated with a KMS key. |
tag:GetResources |
Query resources by tag across services (supports KMS key discovery by tag). |
"Sid": "ReadOnlyAccessForOperationsWithNoKMSKey"
Grants account-wide discovery actions that donât target a specific key.
Policy Action |
Description |
|---|---|
kms:ListKeys |
List all KMS key IDs in the account. |
kms:ListAliases |
List all key aliases for easy identification of keys by name. |
Note: This policy replaces the non-existent AWSKeyManagementServiceReadOnlyAccess AWS managed policy. AWS does not provide a managed KMS read-only policy, so this is implemented as a custom policy template.
Trusted Advisor Policies¶
Trusted Advisor policies control read-only access to AWS Trusted Advisor checks and recommendations. Trusted Advisor evaluates your account against best practices for cost optimization, performance, security, fault tolerance, and service limits.
Level 1: read-only¶
Purpose: View Trusted Advisor check results and recommendations without the ability to refresh checks or modify preferences.
Typical Users:
FinOps managers (review cost optimization and performance recommendations)
What You Can Do:
â View all Trusted Advisor check details and summaries
â View flagged resources for each check
â View account Support plan and Trusted Advisor preferences
What You Cannot Do:
â No Refresh: Cannot refresh Trusted Advisor checks
â No Modifications: Cannot modify Trusted Advisor preferences or notification settings
â No Priority Access: Does not include Trusted Advisor Priority features (separate policy)
Sample Permissions:
[
{
"Sid": "TrustedAdvisorReadOnlyAccess",
"Effect": "Allow",
"Action": [
"trustedadvisor:DescribeChecks",
"trustedadvisor:DescribeCheckSummaries",
"trustedadvisor:DescribeCheckItems",
"trustedadvisor:DescribeAccount"
],
"Resource": "*"
}
]
"Sid": "TrustedAdvisorReadOnlyAccess"
Grants read-only access to Trusted Advisor checks and account information.
Policy Action |
Description |
|---|---|
trustedadvisor:DescribeChecks |
View details for all Trusted Advisor checks. |
trustedadvisor:DescribeCheckSummaries |
View summaries of check results. |
trustedadvisor:DescribeCheckItems |
View specific details for flagged resources. |
trustedadvisor:DescribeAccount |
View Support plan and Trusted Advisor preferences. |
Note: This policy replaces the non-existent AWSTrustedAdvisorReadOnlyAccess AWS managed policy. AWS does not provide a managed Trusted Advisor read-only policy. The closest alternatives are AWSTrustedAdvisorPriorityReadOnlyAccess (scoped to Priority features only) and AWSSupportAccess (broader, includes check refresh). This custom template provides least-privilege read-only access.
Combined Policies¶
Combined policies merge multiple service-level read-only policies into a single managed policy. This is required when a group needs read-only access across many services and would otherwise exceed the AWS hard limit of 10 managed policies per group.
Why Combined Policies Exist¶
AWS IAM enforces a hard cap of 10 managed policies per group (cannot be increased). Groups like operations_support need:
Multiple AWS managed read-only policies (CloudWatch, X-Ray, KMS, etc.)
Multiple customer managed service-level policies (S3, ECR, SageMaker, Lambda, Bedrock)
When the total exceeds 10, we consolidate the customer managed service-level policies into a single combined policy.
Key Constraints¶
Constraint |
Limit |
|---|---|
Managed policies per group |
10 (hard cap, no increase) |
Managed policy size |
6,144 characters (JSON) |
Customer managed policies per account |
1,500 (can increase to 5,000) |
Design Rules¶
Individual service-level templates still exist â they are reused by other groups that donât hit the limit
Combined policies are standalone templates â the generator treats them like any other policy, no special merge logic
Combined policies are group-specific â named for the group they serve (e.g.,
ops-services-read-only)Only combine when forced â if a group is within the 10-policy limit, use individual service-level policies
ops-services-read-only¶
Purpose: Consolidates 5 service-level read-only policies into a single managed policy for the operations_support group.
Replaces:
s3: level1(read-only)ecr: level1(read-only)sagemaker: level1(read-only)lambda: level1(invoke-only)bedrock: level1(invoke-only)
Policy Budget (operations_support):
Before |
Count |
After |
Count |
|---|---|---|---|
AWS managed policies |
7 |
AWS managed policies |
7 |
Customer managed (5 individual) |
5 |
Customer managed (1 combined) |
1 |
Total |
12 đš |
Total |
8 â |
Size Check: ~3,856 characters (JSON) â well within the 6,144 character limit.
Sids (14 total):
Service |
Sids |
Source Template |
|---|---|---|
S3 |
AllowListAllBuckets, AllowReadAndVersionAccess |
s3/level1-read-only |
ECR |
AllowECRAuth, AllowReadOnlyPullAndMetadata |
ecr/level1-read-only |
SageMaker |
SageMakerEndpointReadOnly, CloudWatchMetricsReadOnly, AutoScalingReadOnly, ExplicitDenyInference |
sagemaker/level1-read-only |
Lambda |
LambdaDiscoveryListActions, LambdaDiscoveryActions, LambdaInvocationActions |
lambda/level1-invoke-only |
Bedrock |
BedrockDiscovery, BedrockStandardInference, BedrockConverseInference |
bedrock/level1-invoke-only |
Template Location: policies/templates/combined/ops-services-read-only.yaml
Config Usage:
operations_support:
managed_policies:
- CloudWatchReadOnlyAccess
- CloudWatchLogsReadOnlyAccess
- AWSXrayReadOnlyAccess
- AWSKeyManagementServiceReadOnlyAccess
- ServiceQuotasReadOnlyAccess
- IAMReadOnlyAccess
- AmazonSNSReadOnlyAccess
policy_assignments:
combined: ops-services-read-only
Maintenance Note: If any of the 5 source service-level templates change (e.g., a new action added to S3 level1), the combined policy must be updated manually to stay in sync. This is an accepted trade-off â operations_support read-only policies change infrequently.
mlops-services-a / b / c¶
Purpose: Consolidates 6 service-level deployment policies into 3 combined policies for the mlops_engineers group. Split into 3 because the 6 services combined exceed the 6,144 character managed policy size limit.
Split:
Policy |
Services |
Chars |
Sids |
|---|---|---|---|
|
S3 level2, ECR level3, Pipeline level3, SageMaker level3 |
~5,069 |
15 |
|
Lambda level2 |
~3,455 |
8 |
|
Bedrock level2 |
~2,896 |
8 |
Replaces:
s3: level2(project-buckets-only)ecr: level3(ci-read-write)pipeline: level3(project-ci)sagemaker: level3(prod-invoke)lambda: level2(deploy-manage)bedrock: level2(model-manage)
Policy Budget (mlops_engineers):
Before |
Count |
After |
Count |
|---|---|---|---|
AWS managed policies |
4 |
AWS managed policies |
4 |
Customer managed (6 individual) |
6 |
Customer managed (3 combined) |
3 |
Total |
10 â ïž |
Total |
7 â |
Template Locations:
policies/templates/combined/mlops-services-a.yamlpolicies/templates/combined/mlops-services-b.yamlpolicies/templates/combined/mlops-services-c.yaml
Config Usage:
mlops_engineers:
managed_policies:
- AmazonECS_FullAccess
- AWSCodeDeployFullAccess
- AWSServiceCatalogEndUserFullAccess
- CloudWatchLogsReadOnlyAccess
policy_assignments:
combined_a: mlops-services-a
combined_b: mlops-services-b
combined_c: mlops-services-c
Why 3 policies instead of 1? The 6 services combined produce ~11,320 characters of JSON â nearly double the 6,144 character managed policy size limit. Lambda level2 (3,455 chars) and Bedrock level2 (2,896 chars) are each too large to combine with the other 4 services, so each gets its own policy.
Maintenance Note: If any of the 6 source service-level templates change, the corresponding combined policy must be updated manually. Review when any source template is modified.
Assignment Recommendations¶
Typical Team Structure¶
Storage & Pipeline¶
Role |
S3 |
ECR |
Pipeline |
|---|---|---|---|
Junior Data Scientist |
read-only |
read-only |
read-only |
Data Scientist |
project-buckets-only |
read-only |
read-only |
Senior Data Scientist |
project-buckets-full |
read-only |
read-only |
ML Engineer |
project-buckets-full |
dev-read-write |
project-dev |
MLOps Engineer |
full |
full |
full |
Backend Developer |
- |
read-only |
- |
Auditor / Compliance |
read-only |
read-only |
read-only |
Model Risk Manager |
read-only |
read-only |
read-only |
Executive / Stakeholder |
- |
- |
read-only |
CI/CD Pipeline (Role) |
project-buckets-only |
ci-read-write |
project-ci |
Platform Admin |
full |
full |
full |
Inference¶
Role |
SageMaker |
Lambda |
Bedrock |
|---|---|---|---|
Junior Data Scientist |
read-only |
- |
invoke-only |
Data Scientist |
dev-invoke |
- |
invoke-only |
Senior Data Scientist |
dev-invoke |
- |
invoke-only |
ML Engineer |
dev-invoke |
deploy-manage |
model-manage |
MLOps Engineer |
full |
full |
full |
Backend Developer |
prod-invoke |
- |
invoke-only |
Auditor / Compliance |
read-only |
- |
invoke-only |
Model Risk Manager |
read-only |
- |
invoke-only |
Executive / Stakeholder |
- |
- |
invoke-only |
CI/CD Pipeline (Role) |
deploy-only |
deploy-manage |
invoke-only |
Platform Admin |
full |
full |
full |
Assignment Best Practices¶
Start Minimal - Begin with read-only, expand based on actual needs
Time-Bound Elevation - Grant temporary full access for specific tasks, then revoke
Project Isolation - Use project-only levels to prevent cross-team interference
Separate Humans from Automation - Use dev-read-write for users, ci-read-write for roles
Regular Reviews - Audit access quarterly, remove unused permissions
Troubleshooting¶
Common AccessDenied Scenarios¶
âAccess Denied when uploading to S3â¶
Error:
An error occurred (AccessDenied) when calling the PutObject operation
Cause: You have read-only access
Solution: Request project-buckets-only or higher
âAccess Denied when deleting S3 objectsâ¶
Error:
An error occurred (AccessDenied) when calling the DeleteObject operation
Cause: You have project-buckets-only (no delete permission)
Solution: Request project-buckets-full access
âAccess Denied when pushing to ECRâ¶
Error:
denied: User: arn:aws:iam::123456789012:user/john is not authorized to perform: ecr:PutImage
Cause: You have read-only ECR access
Solution: Request dev-read-write access (if youâre a human) or ci-read-write (if youâre a CI/CD pipeline)
âAccess Denied when invoking SageMaker endpointâ¶
Error:
An error occurred (AccessDeniedException) when calling the InvokeEndpoint operation
Cause: You donât have inference policy, or endpoint is in a different environment
Solution:
For production endpoints: Request prod-invoke access
For dev/staging endpoints: Request dev-invoke access
âCannot authenticate to ECRâ¶
Error:
Error response from daemon: Get https://123456789012.dkr.ecr.us-east-1.amazonaws.com/v2/: no basic auth credentials
Cause: Missing GetAuthorizationToken permission or expired token
Solution:
Verify you have any ECR policy level (all include GetAuthorizationToken)
Re-run:
aws ecr get-login-password | docker login ...Check AWS credentials are valid:
aws sts get-caller-identity
âAccess Denied when starting or stopping a pipelineâ¶
Error:
An error occurred (AccessDeniedException) when calling the StartPipelineExecution operation
Cause: You have read-only Pipeline access
Solution: Request project-dev (for development pipelines) or project-ci (for CI/CD roles)
âAccess Denied when invoking a Lambda functionâ¶
Error:
An error occurred (AccessDeniedException) when calling the Invoke operation
Cause: You donât have Lambda inference access, or the function name doesnât match your policyâs resource scope
Solution: Request Lambda deploy-manage access. Verify the function follows the {company_prefix}-{env}-* naming convention.
âAccess Denied when deleting a Lambda functionâ¶
Error:
An error occurred (AccessDeniedException) when calling the DeleteFunction operation
Cause: You have deploy-manage (Level 2) which explicitly denies delete operations
Solution: Delete operations require Lambda full (Level 3). Contact your platform administrator.
âAccess Denied when invoking a Bedrock foundation modelâ¶
Error:
An error occurred (AccessDeniedException) when calling the InvokeModel operation
Cause: You donât have Bedrock inference access, or the model hasnât been enabled for the account
Solution:
Verify you have at least invoke-only access
Check that the model is enabled: someone with model-manage access must accept the model agreement first
âAccess Denied when creating or deleting a Bedrock guardrailâ¶
Error:
An error occurred (AccessDeniedException) when calling the CreateGuardrail operation
Cause: You have invoke-only (Level 1) which doesnât include guardrail management
Solution:
To create/update guardrails: Request model-manage access
To delete guardrails: Request full access (Level 3) â Level 2 explicitly denies delete operations
âAccess Denied when creating provisioned throughput in Bedrockâ¶
Error:
An error occurred (AccessDeniedException) when calling the CreateProvisionedModelThroughput operation
Cause: Provisioned throughput is a cost-impacting operation reserved for Level 3
Solution: Request Bedrock full access. This is typically restricted to platform admins and FinOps engineers.
âAccess Denied when passing a role (PassRole)â¶
Error:
An error occurred (AccessDenied) when calling the CreateFunction operation: User is not authorized to perform: iam:PassRole
Cause: Either your policy doesnât include PassRole, or the role ARN doesnât match the {company_prefix}-{env}-*-role-* pattern
Solution:
Verify the role follows the naming convention:
{company_prefix}-{env}-*-role-*Verify the PassRole condition matches the target service (e.g.,
lambda.amazonaws.com,bedrock.amazonaws.com)If the role name is correct, request the appropriate access level that includes PassRole
âAction explicitly denied despite having Allow permissionsâ¶
Error:
An error occurred (AccessDeniedException) when calling the DeleteGuardrail operation: User is not authorized to perform: bedrock:DeleteGuardrail with an explicit deny
Cause: Your policy level includes an explicit Deny statement that overrides any Allow. Lambda deploy-manage (Level 2) and Bedrock model-manage (Level 2) both include Deny blocks for destructive actions.
Solution: Explicit Deny cannot be overridden by Allow â this is by design. You need the full (Level 3) policy which removes the Deny block. Contact your platform administrator.
Security Best Practices¶
1. Principle of Least Privilege¶
Do:
â Start with read-only access
â Grant write access only when needed
â Use project-only scopes when possible
â Limit production access to specific roles
Donât:
â Give everyone full access âjust in caseâ
â Use all-environments when production-only suffices
â Grant delete permissions without justification
2. Separation of Duties¶
Do:
â Assign dev-read-write to IAM users (humans)
â Assign ci-read-write to IAM roles (automation)
â Keep development and production access separate
â Require different people for deployment approval
Donât:
â Use the same credentials for humans and CI/CD
â Give developers direct production write access
â Allow automated systems to have full admin rights
3. Audit and Monitoring¶
Do:
â Enable CloudTrail logging (always on)
â Review access logs quarterly
â Set up alerts for sensitive actions (DeleteBucket, DeleteRepository)
â Monitor for unusual access patterns
Donât:
â Ignore CloudTrail logs
â Share IAM credentials between team members
â Disable logging to âimprove performanceâ
4. Credential Management¶
Do:
â Use IAM roles for EC2/ECS/Lambda (no hardcoded keys)
â Rotate access keys every 90 days
â Use temporary credentials (STS AssumeRole) when possible
â Store secrets in AWS Secrets Manager, not code
Donât:
â Hardcode AWS credentials in code or Docker images
â Commit credentials to Git repositories
â Share access keys via email or Slack
â Use root account credentials for daily work
5. Environment Isolation¶
Do:
â Use separate AWS accounts for dev/staging/prod (ideal)
â Use resource naming conventions (acme-mlops-dev-, acme-mlops-prod-)
â Restrict production access to specific IAM principals
â Require MFA for production access
Donât:
â Mix dev and prod resources in the same bucket/repository
â Allow dev pipelines to access prod endpoints
â Use the same IAM role across all environments
6. Explicit Deny for Destructive Actions¶
Do:
â Use Deny blocks at intermediate levels (Level 2) to prevent accidental deletion
â Reserve delete operations for full (Level 3) principals only
â Include all service-specific delete actions in the Deny block (not just the obvious ones)
â Document which Deny block is active at each level so users understand why Allow doesnât work
Donât:
â Rely on âabsence of Allowâ as a safety mechanism â explicit Deny is stronger
â Add Deny blocks at Level 3 (full) â defeats the purpose of full access
â Forget that explicit Deny overrides any Allow, even from other attached policies
7. AI/ML Service Governance¶
Do:
â Scope Bedrock model access using config-driven
allowed_modelslists per tierâ Enforce guardrails on all production inference workloads before launch
â Restrict provisioned throughput creation to FinOps-approved principals (Level 3 only)
â Scope PassRole to tenant-prefixed roles (
{company_prefix}-{env}-*-role-*) with service conditionsâ Use separate policy levels for model invocation vs model management
Donât:
â Grant
bedrock:*to non-admin roles â provisioned throughput can incur significant costâ Allow unrestricted PassRole â this is the most common privilege escalation vector
â Skip guardrail configuration for production Bedrock workloads
â Let automation roles manage model access agreements â keep that as a human decision
Getting Help¶
Request Access Changes¶
Contact your MLOps platform administrator with:
Current access level - What you have now
Requested access level - What you need
Justification - Why you need it (specific use case)
Duration - Permanent or temporary (e.g., 2 weeks for project)
Report Security Issues¶
If you discover:
Overly permissive policies
Credentials in code or logs
Unauthorized access attempts
Compliance violations
Contact: security@your-company.com (replace with your security team contact)
Appendix: Policy Type Summary¶
S3 Policies¶
read-only - Safe exploration, no modifications
project-buckets-only - Standard work, no deletion
project-buckets-full - Senior users, cleanup capability
full - Platform admins only
ECR Policies¶
read-only - Pull images for local testing
dev-read-write - Humans pushing images manually
ci-read-write - Automation pushing images
full - Repository management
Pipeline Policies¶
read-only - View pipelines, logs, history (governance/audit)
project-dev - Humans creating/running pipelines (IAM users)
project-ci - Automation creating/running pipelines (IAM roles)
full - Platform-wide management
Inference Policies¶
SageMaker¶
read-only - View endpoint status/config (no invoke, no cost)
dev-invoke - Invoke dev/staging endpoints for testing
prod-invoke - Invoke production endpoints only
full - Complete endpoint lifecycle management
Lambda¶
read-only - View function config/status (no invoke)
deploy-manage - Deploy, update, and invoke functions (no delete)
full - Complete function lifecycle management
Bedrock¶
invoke-only - Call foundation models and list available models
model-manage - Manage model access, guardrails, imports, cross-region inference (no delete, no throughput)
full - Complete Bedrock platform management including provisioned throughput
Document Version: 1.0
Last Updated: 2024
Maintained By: MLOps Platform Team