Policy Guide¶

Table of Contents¶


Overview¶

This guide helps you understand and choose the right IAM policy access levels for your MLOps team. Each policy type (S3, ECR, Pipeline, Inference) offers multiple access levels designed around real-world use cases and security best practices.

Key Principles:

  • Least Privilege - Start with minimal access, expand only when needed

  • Separation of Duties - Different access for humans vs automation

  • Environment Isolation - Production access requires explicit permission

  • Audit Trail - All actions are CloudTrail-logged for compliance


Quick Reference¶

Policy Type

Access Levels

Typical Users

S3

read-only, project-buckets-only, project-buckets-full, full

Data Scientists, ML Engineers, Admins

ECR

read-only, dev-read-write, ci-read-write, full

Developers, CI/CD Pipelines, DevOps

Pipeline

read-only, project-dev, project-ci, full

ML Engineers, MLOps Admins, Auditors, CI/CD

Inference

read-only, read-only-invoke, dev-invoke, prod-invoke, full, deploy-only

Data Scientists, Backend Developers, Business Consumers, MLOps, CI/CD


S3 Policies¶

Your S3 bucket structure follows MLOps best practices with 130+ organized folders for datasets, models, artifacts, and logs.

Level 1: read-only¶

Purpose: This IAM policy grants a user basic read-only and discovery access to his/her S3 environment, but it restricts object-level interaction to specific buckets matching a naming pattern.

Typical Users:

  • Junior data scientists

  • Business analysts

  • Auditors and compliance reviewers

  • External consultants (read-only access)

What You Can Do:

  • ✅ Discover all buckets

  • ✅ View bucket metadata

  • ✅ List objects in specific buckets

  • ✅ Read files and history

  • ✅ See version history

"Sid": "AllowListAllBuckets"

Policy Action

Description

Discover all buckets

See a list of every S3 bucket in your AWS account via the console or CLI (s3:ListAllMyBuckets)

View bucket metadata

Retrieve the AWS region where any bucket is located (s3:GetBucketLocation), which is often required for the S3 Console to function correctly.

"Sid": "AllowReadAndVersionAccess"

Policy Action

Description

List objects in specific buckets

See the files and folders inside buckets that match the pattern arn:aws:s3:::{company_prefix}-{env}-{tenant_id}-*.

Read files and history

Download or view the content of objects and their historical versions (if versioning is enabled) within those specific matching buckets (s3:GetObject, s3:GetObjectVersion).

See version history

List the different versions of files within the allowed buckets (s3:ListBucketVersions).

What You Cannot Do:

  • ❌ No modifications

  • ❌ No permission changes

  • ❌ No access to other buckets’ content

  • ❌ No administrative tasks

Policy Action

Description

No modifications

Perform any “write” actions, such as uploading files (s3:PutObject), deleting files (s3:DeleteObject), or creating new buckets (s3:CreateBucket).

No permission changes

Modify bucket policies or Access Control Lists (ACLs) to change who else can access the data.

No access to other buckets’ content

While they can see the names of all buckets in the account, they cannot see the files inside or download anything from any bucket that doesn’t match the {company_prefix}-{env}-{tenant_id}-* prefix.

No administrative tasks

Cannot empty buckets, change lifecycle rules, or modify bucket settings like encryption or logging.

Example Scenario:

Sarah is a new data scientist who needs to explore existing datasets and model artifacts to understand the current ML pipeline. She doesn’t need to upload anything yet, just learn the landscape.

Sample Permissions:

[
  {
    "Sid": "AllowListAllBuckets",
    "Effect": "Allow",
    "Action": [
      "s3:ListAllMyBuckets",
      "s3:GetBucketLocation"
    ],
    "Resource": "arn:aws:s3:::*"
  },
  {
    "Sid": "AllowReadAndVersionAccess",
    "Effect": "Allow",
    "Action": [
      "s3:GetObject",
      "s3:GetObjectVersion",
      "s3:ListBucket",
      "s3:ListBucketVersions"
    ],
    "Resource": [
      "arn:aws:s3:::edge-prod-b001-*",
      "arn:aws:s3:::edge-prod-b001-*/*"
    ]
  }
]

Note: ListAllMyBuckets cannot be scoped to specific buckets (AWS limitation). Users will see bucket names across the account but can only read objects from their own tenant’s buckets.


Level 2: project-buckets-only¶

Purpose: This IAM policy implements policy that allows standard data science and ML engineering workflows while strictly preventing deletions and bucket-level changes. This policy uses an explicit allow for specific operations and relies on the absence of delete permissions to enforce your “Cannot Do” rules.

Typical Users:

  • Data scientists (standard access)

  • ML engineers (development work)

  • Automated training jobs

  • Experimentation workflows

What You Can Do:

  • ✅ Read-Only Everything

  • ✅ Upload & Overwrite

  • ✅ Create Folders

  • ✅ Modify Object Tags

"Sid": "ListBucketsAndLocation"

Applied to all S3 resources (*), these allow the user to see the “big picture” in the AWS Console or via CLI.

Policy Action

Description

s3:ListAllMyBuckets

Allows the user to list the names of all buckets owned by the AWS account.

s3:GetBucketLocation

Allows the user to see which AWS Region (e.g., us-east-1) a specific bucket resides in.

"Sid": "BucketLevelReadAndList"

These actions apply to the bucket itself, rather than the files inside it.

Policy Action

Description

s3:ListBucket

Allows the user to list the objects (files and folders) inside the bucket.

s3:GetBucketVersioning

Allows the user to check if the bucket has Versioning enabled (which keeps a history of object changes).

"Sid": "ObjectLevelReadWriteAndTagging"

These actions allow the user to manage the actual data and metadata within the buckets.

Policy Action

Description

s3:GetObject

Allows the user to download or read a file.

s3:GetObjectVersion

Allows the user to retrieve a specific historical version of a file (if versioning is on).

s3:PutObject

Allows the user to upload new files or update existing ones.

s3:PutObjectTagging

Allows the user to add or change “tags” (key-value pairs used for organization or billing) on an object.

s3:GetObjectTagging

Allows the user to view the tags currently assigned to an object.

What You Cannot Do:

Policy Action

Description

Delete anything

There are no s3:DeleteObject or s3:DeleteBucket permissions in the policy.

Manage Permissions

The user cannot change or view Access Control Lists (ACLs) or bucket policies (no s3:PutBucketPolicy, s3:GetBucketAcl, etc.).

Access other Buckets

The user can only list/read/write to buckets starting with the prefix edge-prod-b001-*. Any other bucket is restricted.

Modify Bucket Settings

Aside from viewing versioning, the user cannot change bucket configurations like Encryption, Logging, or Lifecycle rules.

Perform Administrative Tasks

The cannot create new buckets or delete existing ones.

Object Permanent Deletion

Even though a user can “Put” objects, he/she cannot remove them or manage object versions beyond reading them.

Manage Lifecycle or Encryption

Can not set up data archiving (Glacier), expiration rules, or modify server-side encryption settings.

CORS or Website Config

A user lacks permissions to configure the buckets for static website hosting or cross-origin resource sharing.

Example Scenario:

Marcus is training models and needs to upload preprocessed datasets to raw-data/project-x/ and save model artifacts to models/project-x/. He can overwrite files during iterative development but cannot accidentally delete the team’s shared datasets.

Sample Permissions:

[
  {
    "Sid": "ListBucketsAndLocation",
    "Effect": "Allow",
    "Action": [
      "s3:ListAllMyBuckets",
      "s3:GetBucketLocation"
    ],
    "Resource": "arn:aws:s3:::*"
  },
  {
    "Sid": "BucketLevelReadAndList",
    "Effect": "Allow",
    "Action": [
      "s3:ListBucket",
      "s3:GetBucketVersioning"
    ],
    "Resource": "arn:aws:s3:::edge-prod-b001-*"
  },
  {
    "Sid": "ObjectLevelReadWriteAndTagging",
    "Effect": "Allow",
    "Action": [
      "s3:GetObject",
      "s3:GetObjectVersion",
      "s3:PutObject",
      "s3:PutObjectTagging",
      "s3:GetObjectTagging"
    ],
    "Resource": [
      "arn:aws:s3:::edge-prod-b001-*",
      "arn:aws:s3:::edge-prod-b001-*/*"
    ]
  }
]

Level 3: project-buckets-full¶

Purpose: The following IAM policy provides “Full Object Access” for your project buckets. It allows senior data scientists and leads to manage all data (including deleting objects and versions) while strictly preventing any changes to the bucket’s structure or configuration.

Typical Users:

  • Senior data scientists

  • ML team leads

  • Project managers (data cleanup)

  • Cost optimization roles

What You Can Do:

  • ✅ Global Actions

  • ✅ Bucket-Level Access

  • ✅ Object-Level Management

"Sid": "AllowListAllBuckets"

Provides global visibility to see that S3 buckets exist and where they are located.

Policy Action

Description

s3:ListAllMyBuckets

Allows the user to list all buckets in the AWS account (required for viewing buckets in the AWS Console).

s3:GetBucketLocation

Allows the user to see the specific AWS Region where a bucket is hosted.

"Sid": "BucketLevelReadAndList"

Allows the user to see what is inside specific buckets (those starting with edge-prod-b001-).

Policy Action

Description

s3:ListBucket

Allows the user to list the objects (files) within a bucket.

s3:ListBucketVersions

Allows the user to list all versions of every object in the bucket.

s3:GetBucketVersioning

Allows the user to check if the bucket has versioning enabled or suspended.

"Sid": "ObjectLevelFullManagement"

Grants full control over the lifecycle and metadata of files within the edge-prod-b001- buckets.

Policy Action

Description

s3:GetObject

Allows reading/downloading a file.

s3:GetObjectVersion

Allows downloading a specific historical version of a file.

s3:PutObject

Allows uploading new files or updating existing ones.

s3:DeleteObject

Allows removing the current version of a file.

s3:DeleteObjectVersion

Allows permanently deleting a specific historical version of a file.

s3:PutObjectTagging

Allows adding or updating key-value tags on a file (often used for cost tracking or access control).

s3:GetObjectTagging

Allows viewing the tags associated with a file.

s3:AbortMultipartUpload

Allows canceling a large file upload that is currently in progress, which cleans up temporary storage parts.

What You Cannot Do:

  • ❌ Read from or write to other buckets

  • ❌ Administrative changes

  • ❌ Bucket Creation or Deletion

  • ❌ Permanent Deletions (MFA)

  • ❌ Permissions Management

  • ❌ Account-wide S3 Features

Policy Action

Description

Read from or write to other buckets

While a user can see the names of all buckets in the account, he/she cannot list the contents or download files from any bucket that doesn’t start with edge-prod-b001-.

Administrative changes

A user cannot delete buckets, change bucket policies, or modify encryption settings (policy does not include actions like s3:DeleteBucket or s3:PutBucketPolicy).

Bucket Creation or Deletion

There are no permissions in the policy to create a brand-new bucket or delete an existing one

Permanent Deletions (MFA)

If MFA delete is enabled on a bucket, a user wouldn’t be able to permanently purge versions without an MFA token.

Permissions Management

A user cannot grant other people access to these files because s3:PutObjectAcl or s3:PutBucketAcl are not included in the policy.

Account-wide S3 Features

This policy does not allow the user to manage account-level features like S3 Access Points, S3 Object Lambda, or S3 Batch Operations.

Example Scenario:

Elena is a senior ML engineer managing a project that generated 500GB of failed experiment artifacts over 6 months. She needs to delete these to reduce S3 costs while keeping successful model artifacts intact.

Why This Level Exists: The S3 provisioner creates buckets with versioning disabled by default to save costs. However, customers can enable versioning. This level includes DeleteObjectVersion so senior team members can clean up versions if needed, without requiring full admin access.

Sample Permissions:

[
  {
    "Sid": "AllowListAllBuckets",
    "Effect": "Allow",
    "Action": [
      "s3:ListAllMyBuckets",
      "s3:GetBucketLocation"
    ],
    "Resource": "arn:aws:s3:::*"
  },
  {
    "Sid": "BucketLevelReadAndList",
    "Effect": "Allow",
    "Action": [
      "s3:ListBucket",
      "s3:ListBucketVersions",
      "s3:GetBucketVersioning"
    ],
    "Resource": "arn:aws:s3:::edge-prod-b001-*"
  },
  {
    "Sid": "ObjectLevelFullManagement",
    "Effect": "Allow",
    "Action": [
      "s3:GetObject",
      "s3:GetObjectVersion",
      "s3:PutObject",
      "s3:DeleteObject",
      "s3:DeleteObjectVersion",
      "s3:PutObjectTagging",
      "s3:GetObjectTagging",
      "s3:AbortMultipartUpload"
    ],
    "Resource": [
      "arn:aws:s3:::edge-prod-b001-*",
      "arn:aws:s3:::edge-prod-b001-*/*"
    ]
  }
]

Level 4: full¶

Purpose: This policy allows the identified administrators and engineers to perform all S3 operations, including high-level bucket management like modifying policies, versioning, lifecycle rules, replication, and bucket deletion.

Typical Users:

  • MLOps platform administrators

  • DevOps engineers

  • Infrastructure team

  • Break-glass emergency access

What You Can Do:

This is a wildcard permission that covers all 100+ S3 operations.

  • ✅ Manage Buckets: Create new buckets, delete existing ones, and change bucket regions.

  • ✅ Manage Objects: Upload, download, copy, and permanently delete files (objects).

  • ✅ Control Security: Modify Bucket Policies, Access Control Lists (ACLs), and Public Access Block settings, potentially making data public.

  • ✅ Configure Features: Set up lifecycle rules (like auto-archiving to Glacier), enable versioning, configure replication, and manage encryption settings.

  • ✅ Account-Level Tasks: View storage inventory, analytics, and metrics for the entire S3 service.

What You Cannot Do:

  • ❌ Nothing - this is full S3 access within your environment

Example Scenario:

James is the MLOps platform owner who needs to configure S3 lifecycle policies to automatically archive old model artifacts to Glacier after 90 days, reducing storage costs by 70%.

Security Note: ⚠ This level should be assigned sparingly. Most users need project-buckets-only or project-buckets-full.

Sample Permissions:

[
  {
    "Sid": "S3FullAccessPermissions",
    "Effect": "Allow",
    "Action": "s3:*",
    "Resource": [
      "arn:aws:s3:::*",
      "arn:aws:s3:::*/*"
    ]
  }
]

ECR Policies¶

ECR (Elastic Container Registry) stores your Docker images for ML training, inference, and pipeline components.

Enterprise Compliance Model¶

ECR policies follow a 4-level model that separates human access from automation access — critical for regulated industries (FinTech, Healthcare, Government) where compliance requires separation of duties.

Level Overview:

Level

Name

Who

Purpose

1

read-only

Runtime environments

Pull images — pure consumers

2

dev-read-write

Data scientists (humans)

Push/pull, create repos, trigger scans interactively

3

ci-read-write

CI/CD pipelines (automation)

Push/pull, create repos, validate lifecycle rules

4

full

MLOps administrators

Complete registry management including deletion

Key Distinction — Level 2 vs Level 3:

  • dev-read-write → Assigned to IAM users (humans)

  • ci-read-write → Assigned to IAM roles (CI/CD automation)

While both levels share a core set of push/pull/discovery actions, they are shaped by who uses them:

  • Level 2 includes ecr:StartImageScan and ecr:DescribeImageScanFindings — humans trigger and review scans interactively

  • Level 3 includes ecr:GetLifecyclePolicyPreview — CI pipelines validate lifecycle rules as part of infrastructure automation

  • Level 3 omits scan actions because CI relies on ECR’s scan-on-push setting

This separation ensures audit trails clearly show human vs automated actions, satisfying SOC2, HIPAA, and PCI-DSS requirements.


Level 1: read-only¶

Purpose: Pull images for local development and testing

Typical Users:

  • Data scientists (local testing)

  • QA engineers

  • Security scanners

  • Developers onboarding to the platform

What You Can Do:

  • ✅ Authenticate

  • ✅ Pull Images

  • ✅ View Metadata

  • ✅ Check Security

"Sid": "AllowECRAuth"

Grants the basic permission required to authenticate with Amazon ECR. This is the “handshake” step needed before any other ECR actions can be performed.

Policy Action

Description

ecr:GetAuthorizationToken

Allows the user to retrieve an encrypted authorization token. This token is used with the Docker CLI (via aws ecr get-login-password) to authenticate your local environment to the registry.

"Sid": "AllowReadOnlyPullAndMetadata"

Provides “Read-Only” access to ECR. It allows users to view repository details and pull (download) images, but does not allow them to upload (push), delete, or modify anything.

Policy Action

Description

ecr:BatchCheckLayerAvailability

Allows the user to check if the specific “layers” that make up a Docker image already exist in the repository.

ecr:GetDownloadUrlForLayer

Provides a URL to download a specific image layer; this is a background action required for the docker pull command to function.

ecr:BatchGetImage

Allows the user to retrieve the detailed information (manifests) for a specific set of images to facilitate downloading them.

ecr:DescribeRepositories

Allows the user to see a list of repositories within the registry and view their settings.

ecr:ListImages

Allows the user to view a list of all image tags and digests within a specific repository.

ecr:DescribeImages

Provides detailed metadata about specific images, such as the size, push date, and associated tags.

ecr:DescribeImageScanFindings

Allows the user to view the results of vulnerability scans performed on the images.

What You Cannot Do:

  • ❌ Upload Images

  • ❌ Delete Content

  • ❌ Modify Settings

Policy Action

Description

Upload Images

A user can not push new images or layers (actions not present in the policy: ecr:PutImage, ecr:InitiateLayerUpload, etc.).

Delete Content

A user can not delete images, tags or repositories (actions ecr:BatchDeleteImage and ecr:DeleteRepository are not included in the policy

Modify Settings

A user can not create repositories, change permissions, or update lifecycle policies (actions ecr:CreateRepository and ecr:SetRepositoryPolicy are not included in the policy

Example Scenario:

Priya is a data scientist who needs to pull the team’s base ML training image (acme-mlops-dev/ml-training:v2.1) to run experiments locally on her laptop.

Sample Permissions:

[
  {
    "Sid": "AllowECRAuth",
    "Effect": "Allow",
    "Action": [
      "ecr:GetAuthorizationToken"
    ],
    "Resource": "*"
  },
  {
    "Sid": "AllowReadOnlyPullAndMetadata",
    "Effect": "Allow",
    "Action": [
      "ecr:BatchCheckLayerAvailability",
      "ecr:GetDownloadUrlForLayer",
      "ecr:BatchGetImage",
      "ecr:DescribeRepositories",
      "ecr:ListImages",
      "ecr:DescribeImages",
      "ecr:DescribeImageScanFindings"
    ],
    "Resource": "arn:aws:ecr:*:*:repository/${company_prefix}-${env}/*"
  }
]

Docker Command:

This command sequence authenticates your local Docker client with a private Amazon Elastic Container Registry (ECR) and then downloads a specific container image to your machine. Together, these commands ensure you have the necessary permissions to access a private AWS repository and download a machine learning training image (ml-training:v2.1) used in your MLOps development environment. The authorization token provided by AWS is valid for 12 hours, after which you must run the login command again.

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 123456789012.dkr.ecr.us-east-1.amazonaws.com

docker pull 123456789012.dkr.ecr.us-east-1.amazonaws.com/acme-mlops-dev/ml-training:v2.1

Explanation:

Part 1: Authentication

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 123456789012.dkr.ecr.us-east-1.amazonaws.com 
  • aws ecr get-login-password: This AWS CLI command retrieves a temporary base64-encoded authorization token.

  • –region us-east-1: Specifies the AWS Region where your registry is hosted.

  • | (The Pipe): This takes the password generated by the first command and passes it directly as input to the next command.

  • docker login: Initializes the login process for a Docker registry.

  • –username AWS: For Amazon ECR, the username is always AWS.

  • –password-stdin: Tells Docker to read the password from the “standard input” (the pipe), which is more secure than typing it out.

  • 123456789012.dkr.ecr.us-east-1.amazonaws.com: This is the unique URI for your private registry. It follows the format <account_id>.dkr.ecr.<region>.amazonaws.com.

Part 2: Pulling the Image

docker pull 123456789012.dkr.ecr.us-east-1.amazonaws.com/acme-mlops-dev/ml-training:v2.1
  • docker pull: The standard command to download an image from a registry.

  • acme-mlops-dev/ml-training: The name of the specific repository within your ECR registry.

  • :v2.1: The specific version tag of the image you want to download.


Level 2: dev-read-write¶

Purpose: This JSON policy grants the necessary permissions for developers to build, tag, and push images while maintaining read-only visibility and the ability to create new repositories. It explicitly excludes destructive or administrative actions like deleting repositories, modifying policies, or changing lifecycles.

Typical Users:

  • ML engineers (manual image builds)

  • DevOps engineers (troubleshooting)

  • Platform developers (base image maintenance)

Assignment: IAM Users only (not roles)

What You Can Do:

  • ✅ Everything in read-only, PLUS:

  • ✅ Push new Docker images

  • ✅ Tag images

  • ✅ Create new repositories

  • ✅ Initiate image scans

"Sid": "ReadOnlyAndDiscovery"

Policy Action

Description

ecr:GetAuthorizationToken

Obtain a temporary password to authenticate a Docker CLI to the registry.

ecr:DescribeRepositories

View metadata about existing repositories (e.g., URI, creation date).

ecr:DescribeImages

View metadata about images within a repository (e.g., push date, size, tags).

ecr:ListImages

Get a list of all image IDs in a repository.

ecr:BatchGetImage

Pull/download image manifest information for one or more images.

ecr:GetRepositoryPolicy

View the JSON resource-level policy attached to a repository.

ecr:GetLifecyclePolicy

View the rules that automatically clean up old images.

ecr:ListTagsForResource

View the tags (key-value pairs) assigned to an ECR resource.

ecr:DescribeImageScanFindings

View the security vulnerability reports for scanned images.

"Sid": "PushAndTagImages"

Policy Action

Description

ecr:BatchCheckLayerAvailability

Check if specific image layers already exist in the registry (used during push).

ecr:GetDownloadUrlForLayer

Retrieve a URL to download specific image layers.

ecr:InitiateLayerUpload

The first step of the process required to upload new image layers to a repository.

ecr:UploadLayerPart

The second step of the process required to upload new image layers to a repository.

ecr:CompleteLayerUpload

The third step of the process required to upload new image layers to a repository.

ecr:PutImage

Finalize the upload by adding the image manifest to the repository.

ecr:TagResource

Add or update tags (metadata) on ECR resources like repositories.

"Sid": "ManageRepositoriesAndScans"

Policy Action

Description

ecr:CreateRepository

Create entirely new, empty repositories.

ecr:StartImageScan

Manually trigger a vulnerability scan on an existing image.

What You Cannot Do:

  • ❌ Delete images, repositories or life cycle policies

  • ❌ Modify repository policies

  • ❌ Change lifecycle policies

Policy Action

Description

Delete Resources

The policy lacks ecr:DeleteRepository, ecr:BatchDeleteImage, or ecr:DeleteLifecyclePolicy. Users cannot remove any data.

Modify Policies

There are no PutRepositoryPolicy or SetRepositoryPolicy permissions; users cannot change who has access to these resources.

Lifecycle Management

Users can view policies but cannot create or change them (ecr:PutLifecyclePolicy).

Public ECR

This policy applies to Private ECR. It does not grant permissions for ecr-public actions.

Example Scenario:

Tom is an ML engineer who built a new training image with updated dependencies. He needs to push it to ECR so the team can test it before integrating into the CI/CD pipeline.

Sample Permissions:

[
  {
    "Sid": "AllowECRAuth",
    "Effect": "Allow",
    "Action": [
      "ecr:GetAuthorizationToken"
    ],
    "Resource": "*"
  },
  {
    "Sid": "ReadOnlyAndDiscovery",
    "Effect": "Allow",
    "Action": [
      "ecr:DescribeRepositories",
      "ecr:DescribeImages",
      "ecr:ListImages",
      "ecr:BatchGetImage",
      "ecr:GetRepositoryPolicy",
      "ecr:GetLifecyclePolicy",
      "ecr:ListTagsForResource",
      "ecr:DescribeImageScanFindings"
    ],
    "Resource": "arn:aws:ecr:*:*:repository/${company_prefix}-${env}/*"
  },
  {
    "Sid": "PushAndTagImages",
    "Effect": "Allow",
    "Action": [
      "ecr:BatchCheckLayerAvailability",
      "ecr:GetDownloadUrlForLayer",
      "ecr:InitiateLayerUpload",
      "ecr:UploadLayerPart",
      "ecr:CompleteLayerUpload",
      "ecr:PutImage",
      "ecr:TagResource"
    ],
    "Resource": "arn:aws:ecr:*:*:repository/${company_prefix}-${env}/*"
  },
  {
    "Sid": "ManageRepositoriesAndScans",
    "Effect": "Allow",
    "Action": [
      "ecr:CreateRepository",
      "ecr:StartImageScan"
    ],
    "Resource": "arn:aws:ecr:*:*:repository/${company_prefix}-${env}/*"
  }
]

Docker Commands:

# Build and push
docker build -t ml-training:v2.2 .

docker tag ml-training:v2.2 123456789012.dkr.ecr.us-east-1.amazonaws.com/acme-mlops-dev/ml-training:v2.2

docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/acme-mlops-dev/ml-training:v2.2

Why No Delete Permission: Image cleanup should be handled by ECR Lifecycle Policies (automated, safe) rather than manual deletion (error-prone, risky).


Level 3: ci-read-write¶

Purpose: CI/CD pipelines building and pushing images automatically

Typical Users:

  • GitHub Actions workflows

  • Jenkins build jobs

  • CodePipeline stages

  • GitLab CI runners

Assignment: IAM Roles only (not users)

What You Can Do:

  • ✅ Push images from automated builds

  • ✅ Tag images with build metadata

  • ✅ Create repositories on-demand

  • ✅ Read and preview lifecycle policies (for infra-as-code validation)

How It Differs from dev-read-write:

  • ➕ Adds ecr:GetLifecyclePolicyPreview — CI pipelines validate lifecycle rules as part of infrastructure automation

  • ➖ Removes ecr:StartImageScan and ecr:DescribeImageScanFindings — CI relies on ECR’s scan-on-push setting rather than triggering scans directly

What You Cannot Do:

  • ❌ No Deletion: Actions like ecr:BatchDeleteImage or ecr:DeleteRepository are not included, preventing runners from removing version history or entire projects.

  • ❌ No Security Modification: ecr:SetRepositoryPolicy and ecr:DeleteRepositoryPolicy are excluded, ensuring the pipeline cannot change who has access to the images.

  • ❌ No Lifecycle Changes: While the runner can read lifecycle policies, it cannot modify or delete them (ecr:PutLifecyclePolicy), ensuring automated cleanup rules remain intact.

Example Scenario:

A GitHub Actions workflow automatically builds a new training image on every merge to main, tags it with the git commit SHA, and pushes it to ECR for deployment.

Sample Permissions:

[
    {
    "Sid": "AllowECRAuth",
    "Effect": "Allow",
    "Action": [
      "ecr:GetAuthorizationToken"
    ],
    "Resource": "*"
  },
  {
    "Sid": "AllowRepositoryCreation",
    "Effect": "Allow",
    "Action": [
      "ecr:CreateRepository"
    ],
    "Resource": "arn:aws:ecr:*:*:repository/${company_prefix}-${env}/*"
  },
  {
    "Sid": "ContinuousIntegrationReadWrite",
    "Effect": "Allow",
    "Action": [
      "ecr:BatchCheckLayerAvailability",
      "ecr:GetDownloadUrlForLayer",
      "ecr:GetRepositoryPolicy",
      "ecr:DescribeRepositories",
      "ecr:ListImages",
      "ecr:DescribeImages",
      "ecr:BatchGetImage",
      "ecr:GetLifecyclePolicy",
      "ecr:GetLifecyclePolicyPreview",
      "ecr:ListTagsForResource",
      "ecr:InitiateLayerUpload",
      "ecr:UploadLayerPart",
      "ecr:CompleteLayerUpload",
      "ecr:PutImage",
      "ecr:TagResource"
    ],
    "Resource": "arn:aws:ecr:*:*:repository/${company_prefix}-${env}/*"
  }
]

"Sid": "AllowRepositoryCreation"

Policy Action

Description

ecr:CreateRepository

Allows the user to create a new, empty repository to store Docker or OCI-compliant images.

"Sid": "ContinuousIntegrationReadWrite"

Authentication

Policy Action

Description

ecr:GetAuthorizationToken

Allows the user to request a short-lived password (token) to authenticate a Docker CLI client against ECR.

Read/Pull Actions

Policy Action

Description

ecr:BatchCheckLayerAvailability

Checks if specific image layers already exist in the repository.

ecr:GetDownloadUrlForLayer

Retrieves a URL to download a specific image layer.

ecr:GetRepositoryPolicy

Allows the user to view the resource-based permissions policy of a repository.

ecr:DescribeRepositories

Returns metadata about repositories (e.g., creation date, URI, and settings).

ecr:ListImages

Lists basic information about the images stored in a repository.

ecr:DescribeImages

Provides detailed metadata about images, such as size, push date, and tags.

ecr:BatchGetImage

Allows the user to retrieve the image manifest or configuration for one or more images (required for pulling).

ecr:GetLifecyclePolicy

Retrieves the current lifecycle rules (which automate image deletion).

ecr:GetLifecyclePolicyPreview

Allows the user to see the results of a lifecycle policy before it is applied.

ecr:ListTagsForResource

Displays the tags (metadata) associated with a specific ECR repository.

Write/Push Actions

Policy Action

Description

ecr:InitiateLayerUpload

Starts the multi-step process of uploading an image layer.

ecr:UploadLayerPart

Allows the user to upload a specific segment of an image layer.

ecr:CompleteLayerUpload

Informs ECR that all parts of a layer have been uploaded and can be finalized.

ecr:PutImage

Finalizes the push process by uploading the image manifest, making the image available in the repository.

ecr:TagResource

Allows the user to add or update metadata tags on the repository itself.

GitHub Actions Example:

- name: Login to ECR
  run: 
    aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $ECR_REGISTRY

- name: Build and push
  run: |
    docker build -t $ECR_REGISTRY/acme-mlops-dev/ml-training:$GITHUB_SHA .
    docker push $ECR_REGISTRY/acme-mlops-dev/ml-training:$GITHUB_SHA

Compliance Note: In regulated industries, auditors can trace:

  • Human actions → IAM user CloudTrail logs (dev-read-write)

  • Automated actions → IAM role CloudTrail logs (ci-read-write)

This separation satisfies SOC2, HIPAA, and PCI-DSS requirements.


Level 4: full¶

This policy uses the wildcard ecr:* to grant all available permissions within the Amazon ECR service.

Purpose: Administrative access for repository management

Typical Users:

  • MLOps platform administrators

  • DevOps team leads

  • Security engineers (policy management)

What You Can Do:

Functionality

Policy Action

Description

✅ Delete Images and Repositories

ecr:BatchDeleteImage

Allows the deletion of multiple specified images within a repository.

ecr:DeleteRepository

Grants the ability to permanently remove an entire repository.

✅ Modify Repository Policies

ecr:SetRepositoryPolicy

Allows you to apply or change resource-based policies to control who can access specific repositories.

ecr:DeleteRepositoryPolicy

Enables the removal of existing repository access policies.

✅ Configure Lifecycle Policies

ecr:PutLifecyclePolicy

Allows creating or updating rules that automatically expire or delete old images based on age or count.

ecr:GetLifecyclePolicy

Permits viewing the current automated cleanup rules.

✅ Set Up Cross-Region Replication

ecr:PutReplicationConfiguration

Grants permission to configure settings that automatically copy images to other AWS regions or accounts.

✅ Manage Image Scanning Settings

ecr:PutImageScanningConfiguration

Allows you to enable or disable automatic vulnerability scanning upon image push.

ecr:StartImageScan

Permits manually triggering a security scan for a specific image.

✅ Repository and Image Management

ecr:CreateRepository

Allows the creation of new private repositories to store container images.

ecr:DescribeRepositories and ecr:DescribeImages

Provides the ability to list and view metadata for all repositories and images.

ecr:PutImage

Allows pushing new container images or updating existing ones.

What You Cannot Do:

  • ❌ No Restrictions - This policy is designed for full administrative access; there are no denied actions within the ECR service scope.

Example Scenario:

Lisa is the platform administrator who needs to configure an ECR Lifecycle Policy to automatically delete untagged images after 7 days and keep only the last 10 tagged images per repository, reducing storage costs.

Security Note: ⚠ Most users need read-only or dev-read-write. Reserve full access for platform administrators. While this policy allows all ECR actions, users still require ecr:GetAuthorizationToken (included in ecr:*) to authenticate their Docker CLI with the registry.

Sample Permissions:

[
  {
    "Sid": "FullECRAdminAccess",
    "Effect": "Allow",
    "Action": [
        "ecr:*"
    ],
    "Resource": "*"
  }
]

Pipeline Policies¶

SageMaker Pipeline policies control access to ML training and deployment workflows.

Enterprise Compliance Model¶

Pipeline policies follow a 4-level model that separates human access from automation access - mirroring the ECR pattern for consistency and compliance.

Key Distinction:

  • project-dev → Assigned to IAM users (humans)

  • project-ci → Assigned to IAM roles (CI/CD automation)

Both have identical permissions, but the assignment pattern ensures audit trails clearly show human vs automated actions.


Level 1: read-only¶

Purpose: This policy is designed for read-only governance and monitoring across CI/CD and Machine Learning workflows. It allows a user to audit the status, history, and logs of automated pipelines without the ability to create, modify, or delete resources.

The primary goal is to provide full visibility into the state of AWS CodePipeline and SageMaker Model Building Pipelines. It is ideal for auditors, project managers, or automated monitoring tools that need to track deployment progress and execution history across an entire AWS account.

Typical Users:

  • Auditors and compliance reviewers

  • Model risk managers

  • Executive stakeholders

  • New team members learning the platform

  • Cross-team visibility roles

What You Can Do (CI/CD pipelines):

  • ✅ View the full architecture and configuration of any pipeline.

  • ✅ Access the complete execution history for auditing purposes.

  • ✅ Monitor the live progress of a running pipeline and review its logs.

  • ✅ Examine pipeline steps and configurations (e.g., environment variables, source branches).

  • ✅ Access and export lists of pipelines and executions for compliance reporting.

What You Cannot Do (Restrictive actions for CI/CD pipelines):

  • ❌ Create or Modify: You cannot change the pipeline’s structure, add new stages, or delete existing ones.

  • ❌ Start Executions: You are barred from manually triggering a new pipeline run.

  • ❌ Stop/Cancel: You cannot intervene in an active process to stop or roll it back.

  • ❌ Delete: You do not have the permission to remove pipeline resources or execution history.

What You Can Do (Sagemaker pipelines):

  • ✅ List all available pipelines in the account to provide a high-level overview for audit purposes.

  • ✅ View the history of all pipeline runs, allowing auditors to see when and how many times a workflow was triggered.

  • ✅ Examine the individual steps within a specific execution to verify that each stage (e.g., training, processing) completed as expected.

  • ✅ Retrieve the metadata and configuration of a pipeline definition to review its architectural design.

  • ✅ View the current status (e.g., Succeeded, Failed) and specific details of a single execution run.

  • ✅ Access the exact version of the pipeline definition used for a specific historical run, ensuring the “as-run” configuration is verifiable.

What You Cannot Do (Restrictive actions for Sagemaker pipelines):

  • ❌ sagemaker:CreatePipeline: Prevent the creation of new workflows that could bypass established compliance checks.

  • ❌ sagemaker:UpdatePipeline: Ensure that existing validated pipeline definitions remain immutable and cannot be altered.

  • ❌ sagemaker:StartPipelineExecution: Disable the ability to trigger new runs, preventing unauthorized compute costs or production changes.

  • ❌ sagemaker:StopPipelineExecution: Prevent users from interfering with active, ongoing production workloads.

  • ❌ sagemaker:DeletePipeline: Protect historical audit trails and definitions from being permanently removed.

Example Scenario:

Rachel is a model risk manager who needs to review all ML training pipelines quarterly to ensure they meet compliance requirements for bias detection and data validation. She needs to see pipeline configurations and execution logs but should not be able to trigger or modify any workflows.

Sample Permissions:

[
  {
    "Sid": "CodePipelineReadOnly",
    "Effect": "Allow",
    "Action": [
      "codepipeline:GetPipeline",
      "codepipeline:GetPipelineExecution",
      "codepipeline:GetPipelineState",
      "codepipeline:ListPipelines",
      "codepipeline:ListPipelineExecutions",
      "codepipeline:ListActionTypes",
      "codepipeline:ListTagsForResource"
    ],
    "Resource": "*"
  },
  {
    "Sid": "CodeBuildReadOnly",
    "Effect": "Allow",
    "Action": [
      "codebuild:BatchGetBuilds",
      "codebuild:ListBuilds"
    ],
    "Resource": "*"
  },
  {
    "Sid": "PipelineLogsReadOnly",
    "Effect": "Allow",
    "Action": [
      "logs:GetLogEvents",
      "logs:DescribeLogStreams"
    ],
    "Resource": "*"
  },
  {
    "Sid": "SagemakerPipelineReadOnly",
    "Effect": "Allow",
    "Action": [
      "sagemaker:ListPipelines",
      "sagemaker:ListPipelineExecutions",
      "sagemaker:ListPipelineExecutionSteps",
      "sagemaker:DescribePipeline",
      "sagemaker:DescribePipelineExecution",
      "sagemaker:DescribePipelineDefinitionForExecution",
      "sagemaker:GetSearchSuggestions"
    ],
    "Resource": "*"
  }
]

"Sid": "CodePipelineReadOnly"

Grants read-only access to all AWS CodePipeline resources across the account. Uses Resource: * intentionally — auditors and governance teams need account-wide visibility across all tenants to perform compliance reviews.

Policy Action

Description

codepipeline:GetPipeline

View the detailed definition and structure of a pipeline.

codepipeline:GetPipelineExecution

View the status and details of a specific execution instance.

codepipeline:GetPipelineState

Monitor the real-time status of each stage and action within a pipeline.

codepipeline:ListPipelines

List all available pipelines in the account across all tenants.

codepipeline:ListPipelineExecutions

View the history of all past and current pipeline runs.

codepipeline:ListActionTypes

See what types of actions (e.g., Build, Deploy, Test) are available for use.

codepipeline:ListTagsForResource

Review metadata tags used for cost tracking and organizational governance.

"Sid": "CodeBuildReadOnly"

Grants read-only access to all AWS CodeBuild projects across the account. Uses Resource: * for the same governance reason — auditors need visibility into build jobs across all tenants.

Policy Action

Description

codebuild:BatchGetBuilds

View details of specific build jobs triggered by pipelines.

codebuild:ListBuilds

List all build jobs for visibility into build history across all tenants.

"Sid": "PipelineLogsReadOnly"

Grants read-only access to CloudWatch Logs for reviewing pipeline and build execution logs. Uses Resource: * because log group names are generated by AWS services at runtime and do not follow a predictable naming pattern.

Policy Action

Description

logs:GetLogEvents

Access execution logs for audit trails and compliance reviews.

logs:DescribeLogStreams

List available log streams to locate relevant log output.

"Sid": "SagemakerPipelineReadOnly"

Grants read-only access to all SageMaker Pipelines across the account. Uses Resource: * intentionally — auditors need to review ML pipeline configurations, execution history, and step-level details across all tenants for compliance verification.

Policy Action

Description

sagemaker:ListPipelines

List all available pipelines in the account to provide a high-level overview for audit purposes.

sagemaker:ListPipelineExecutions

View the history of all pipeline runs, allowing auditors to see when and how many times a workflow was triggered.

sagemaker:ListPipelineExecutionSteps

Examine the individual steps within a specific execution to verify that each stage (e.g., training, processing) completed as expected.

sagemaker:DescribePipeline

Retrieve the metadata and configuration of a pipeline definition to review its architectural design.

sagemaker:DescribePipelineExecution

View the current status (e.g., Succeeded, Failed) and specific details of a single execution run.

sagemaker:DescribePipelineDefinitionForExecution

Access the exact version of the pipeline definition used for a specific historical run, ensuring the “as-run” configuration is verifiable.

sagemaker:GetSearchSuggestions

Use autocomplete/suggestions when searching for SageMaker resources.

Resource Scope: All four Sids use Resource: * intentionally. Read-only governance access requires account-wide visibility across all tenants — auditors must be able to review any team’s pipelines, builds, and execution logs to perform compliance assessments. This is consistent with how AWS managed policies like ReadOnlyAccess and AWSCloudTrail_ReadOnlyAccess are designed.

Compliance Use Case: In regulated industries, auditors must verify that ML pipelines include required validation steps (data quality checks, bias detection, model explainability). Read-only access enables these reviews without risk of accidental modifications.


Level 2: project-dev¶

Purpose: Human developers creating and managing ML pipelines manually. This policy grants the ability to build, iterate on, and execute both CI/CD delivery pipelines (CodePipeline/CodeBuild) and ML workflow pipelines (SageMaker) within a tenant-scoped boundary.

Typical Users:

  • Data scientists (manual pipeline runs and experimentation)

  • ML engineers (pipeline development and iteration)

  • Research teams (prototyping ML workflows)

Assignment: IAM Users only (not roles)

How It Differs from read-only:

  • ➕ Adds CodePipeline write actions — create, update, start, stop, retry pipelines (read-only has view-only access)

  • ➕ Adds CodeBuild write actions — start and stop builds (read-only can only view build results)

  • ➕ Adds SageMaker Pipeline write actions — create, update, start, stop executions (read-only can only view pipeline state)

  • ➕ Adds codepipeline:TagResource — organize pipeline resources with metadata tags

  • 🔒 Tightens resource scoping — CodePipeline, CodeBuild, and SageMaker are scoped to {company_prefix}-{env}-{tenant_id}-* (read-only uses Resource: * for account-wide governance visibility)

  • ➖ Removes sagemaker:DescribePipelineDefinitionForExecution and sagemaker:GetSearchSuggestions — these are governance/audit actions not needed for active development

What You Can Do (CI/CD pipelines):

  • ✅ Create and update CodePipeline definitions for your project

  • ✅ Start and stop pipeline executions manually

  • ✅ Retry failed stages during development

  • ✅ Start and monitor CodeBuild projects

  • ✅ View build logs for debugging

  • ✅ Tag pipeline resources for organization

What You Can Do (SageMaker pipelines):

  • ✅ Create and update SageMaker Pipeline definitions

  • ✅ Start and stop pipeline executions manually

  • ✅ View execution history, step details, and pipeline configurations

  • ✅ Iterate on pipeline design with different parameters

What You Cannot Do:

  • ❌ No Deletion: codepipeline:DeletePipeline and sagemaker:DeletePipeline are excluded — pipelines are removed through admin-level access only, protecting execution history and audit trails.

  • ❌ No Cross-Tenant Access: Resource scoping limits access to pipelines matching {company_prefix}-{env}-{tenant_id}-*, preventing access to other teams’ workflows.

  • ❌ No Platform-Wide Settings: Cannot modify account-level CodePipeline or SageMaker configurations.

Example Scenario:

Tom is an ML engineer developing a new fraud detection pipeline. He creates a SageMaker Pipeline definition in Python, triggers training runs with different hyperparameters, and monitors execution steps — while the CodePipeline he set up automatically rebuilds the pipeline on each commit to his feature branch.

Sample Permissions:

[
  {
    "Sid": "CodePipelineDevAccess",
    "Effect": "Allow",
    "Action": [
      "codepipeline:CreatePipeline",
      "codepipeline:UpdatePipeline",
      "codepipeline:GetPipeline",
      "codepipeline:GetPipelineExecution",
      "codepipeline:GetPipelineState",
      "codepipeline:ListPipelines",
      "codepipeline:ListPipelineExecutions",
      "codepipeline:ListActionTypes",
      "codepipeline:ListTagsForResource",
      "codepipeline:StartPipelineExecution",
      "codepipeline:StopPipelineExecution",
      "codepipeline:RetryStageExecution",
      "codepipeline:TagResource"
    ],
    "Resource": "arn:aws:codepipeline:*:*:{company_prefix}-{env}-{tenant_id}-*"
  },
  {
    "Sid": "CodeBuildDevAccess",
    "Effect": "Allow",
    "Action": [
      "codebuild:StartBuild",
      "codebuild:StopBuild",
      "codebuild:BatchGetBuilds",
      "codebuild:ListBuilds"
    ],
    "Resource": "arn:aws:codebuild:*:*:project/{company_prefix}-{env}-{tenant_id}-*"
  },
  {
    "Sid": "PipelineLogsAccess",
    "Effect": "Allow",
    "Action": [
      "logs:GetLogEvents",
      "logs:DescribeLogStreams"
    ],
    "Resource": "*"
  },
  {
    "Sid": "SagemakerPipelineDevAccess",
    "Effect": "Allow",
    "Action": [
      "sagemaker:CreatePipeline",
      "sagemaker:UpdatePipeline",
      "sagemaker:DescribePipeline",
      "sagemaker:StartPipelineExecution",
      "sagemaker:StopPipelineExecution",
      "sagemaker:DescribePipelineExecution",
      "sagemaker:ListPipelineExecutions",
      "sagemaker:ListPipelineExecutionSteps"
    ],
    "Resource": "arn:aws:sagemaker:*:*:pipeline/{company_prefix}-{env}-{tenant_id}-*"
  }
]

"Sid": "CodePipelineDevAccess"

Grants development-level access to AWS CodePipeline, scoped to pipelines matching the tenant’s naming convention. This ensures developers can only manage their own team’s CI/CD pipelines.

Pipeline Management

Policy Action

Description

codepipeline:CreatePipeline

Create new CI/CD pipeline definitions for the project.

codepipeline:UpdatePipeline

Modify existing pipeline stages, actions, and configurations.

codepipeline:StartPipelineExecution

Manually trigger a pipeline run.

codepipeline:StopPipelineExecution

Cancel a running pipeline execution.

codepipeline:RetryStageExecution

Re-run a failed stage without restarting the entire pipeline.

codepipeline:TagResource

Add or update metadata tags on pipeline resources.

Read/Monitor Actions

Policy Action

Description

codepipeline:GetPipeline

View the detailed definition and structure of a pipeline.

codepipeline:GetPipelineExecution

View the status and details of a specific execution.

codepipeline:GetPipelineState

Monitor the real-time status of each stage and action.

codepipeline:ListPipelines

List all available pipelines in the account.

codepipeline:ListPipelineExecutions

View the history of all pipeline runs.

codepipeline:ListActionTypes

See what types of actions (Build, Deploy, Test) are available.

codepipeline:ListTagsForResource

Review metadata tags for cost tracking and governance.

"Sid": "CodeBuildDevAccess"

Grants development-level access to AWS CodeBuild projects, scoped to build projects matching the tenant’s naming convention.

Policy Action

Description

codebuild:StartBuild

Trigger a CodeBuild project to compile, test, or package code.

codebuild:StopBuild

Cancel a running build job.

codebuild:BatchGetBuilds

View details of specific build jobs triggered by the pipeline.

codebuild:ListBuilds

List all build jobs for visibility into build history.

"Sid": "PipelineLogsAccess"

Grants read access to CloudWatch Logs for debugging pipeline and build execution. This Sid uses Resource: * because log group names are generated by CodePipeline and CodeBuild at runtime and do not follow a predictable tenant-scoped naming pattern.

Policy Action

Description

logs:GetLogEvents

Access execution logs for debugging and troubleshooting.

logs:DescribeLogStreams

List available log streams to locate relevant log output.

"Sid": "SagemakerPipelineDevAccess"

Grants development-level access to SageMaker ML pipelines, scoped to the tenant’s project namespace.

Pipeline Management

Policy Action

Description

sagemaker:CreatePipeline

Define a new SageMaker Pipeline for ML workflows (training, processing, evaluation).

sagemaker:UpdatePipeline

Modify an existing pipeline definition to iterate on the workflow design.

sagemaker:StartPipelineExecution

Trigger a pipeline run with specified parameters (e.g., hyperparameters, data paths).

sagemaker:StopPipelineExecution

Cancel a running execution to stop compute costs or abort a misconfigured run.

Read/Monitor Actions

Policy Action

Description

sagemaker:DescribePipeline

View the metadata and configuration of a pipeline definition.

sagemaker:DescribePipelineExecution

View the status, parameters, and details of a specific execution run.

sagemaker:ListPipelineExecutions

View the history of all runs for a pipeline to track iteration progress.

sagemaker:ListPipelineExecutionSteps

Examine individual steps within an execution to identify which step failed or succeeded.

Resource Scope: All four Sids are tenant-scoped to {company_prefix}-{env}-{tenant_id}-*, ensuring developers can only access their own team’s resources. The only exception is PipelineLogsAccess which uses Resource: * because CloudWatch log group names are generated by AWS services at runtime and do not follow a predictable tenant-scoped naming pattern. This is a known AWS limitation — log access can be further restricted via log group resource policies as the platform matures.


Level 3: project-ci¶

Purpose: This IAM policy provides a self-contained set of permissions for CI/CD runners (GitHub Actions, Jenkins, GitLab CI, AWS CodePipeline) to automate end-to-end ML workflows. It includes SageMaker Pipeline orchestration, container registry access, pipeline asset retrieval, configuration/secrets access, and the ability to pass execution roles to SageMaker — everything a CI/CD runner needs to function without requiring additional policy assignments.

Unlike project-dev (which targets human developers working across both CodePipeline and SageMaker), project-ci is focused on automated SageMaker Pipeline workflows with the supporting infrastructure permissions that runners need to operate independently.

Typical Users:

  • GitHub Actions workflows

  • Jenkins build jobs

  • GitLab CI runners

  • AWS CodePipeline stages

Assignment: IAM Roles only (not users)

How It Differs from project-dev:

  • ➖ Removes CodePipeline/CodeBuild management actions — CI/CD runners interact with SageMaker Pipelines directly, not through CodePipeline console

  • ➖ Removes CloudWatch logs read actions — runners capture logs through their own logging mechanisms (GitHub Actions logs, Jenkins console output)

  • ➕ Adds S3 read access for pipeline assets — runners need to download pipeline definitions, code, and model artifacts

  • ➕ Adds Secrets Manager and SSM Parameter Store read access — runners need configuration and secrets for pipeline execution

  • ➕ Adds ECR push/pull access — runners build and push container images used in ML pipeline steps

  • ➕ Adds iam:PassRole — runners must pass execution roles to SageMaker for training and processing jobs

  • ➕ Adds sagemaker:ListPipelines — runners need to discover existing pipelines to decide whether to create or update

What You Can Do:

  • ✅ Create and update SageMaker Pipeline definitions from code

  • ✅ Discover existing pipelines to determine create vs update

  • ✅ Trigger pipeline executions automatically on code merge

  • ✅ Stop pipelines on failure conditions

  • ✅ Monitor execution progress and report step-level status

  • ✅ Download pipeline definitions and artifacts from S3

  • ✅ Retrieve configuration and secrets for pipeline execution

  • ✅ Authenticate to ECR and push/pull container images

  • ✅ Pass execution roles to SageMaker for training and processing jobs

What You Cannot Do:

  • ❌ Delete pipelines

  • ❌ Access other teams’ pipelines

  • ❌ Modify platform-wide settings

  • ❌ Delete container images or repositories

  • ❌ Modify ECR repository policies or lifecycle rules

  • ❌ Write to S3 (read-only access for pipeline assets)

  • ❌ Create or modify secrets/parameters (read-only access)

  • ❌ Pass roles to services other than SageMaker

Restriction

Description

No Pipeline Deletion

sagemaker:DeletePipeline is excluded — pipeline removal is an admin-level action. CI/CD should create and update, not destroy. If a pipeline needs removal, that’s a human decision through project-dev or full access.

No Cross-Tenant Access

SageMaker actions are scoped to {company_prefix}-{env}-{tenant_id}-*, preventing access to other teams’ pipelines.

No Container Deletion

ECR actions do not include ecr:BatchDeleteImage or ecr:DeleteRepository — image cleanup should be handled by ECR Lifecycle Policies, not CI/CD runners.

No S3 Write Access

Runners can read pipeline assets but cannot modify or delete them — pipeline definitions are managed through version control, not runner writes.

No Secrets Modification

Runners can read secrets and parameters but cannot create, update, or delete them — secrets management is an admin responsibility.

No Unrestricted Role Passing

iam:PassRole is conditioned to sagemaker.amazonaws.com only — runners cannot pass roles to other services, limiting blast radius.

Example Scenario:

A GitHub Actions workflow triggers on merge to main. It downloads the pipeline definition from S3, pulls the base training image from ECR, builds a new image with updated code, pushes it to ECR, creates/updates the SageMaker Pipeline definition, passes the SageMaker execution role, starts a training run, and monitors step-level execution status — reporting results back to the GitHub PR.

Sample Permissions:

[
  {
    "Sid": "SageMakerPipelineManagement",
    "Effect": "Allow",
    "Action": [
      "sagemaker:CreatePipeline",
      "sagemaker:UpdatePipeline",
      "sagemaker:DescribePipeline",
      "sagemaker:ListPipelines",
      "sagemaker:StartPipelineExecution",
      "sagemaker:StopPipelineExecution",
      "sagemaker:DescribePipelineExecution",
      "sagemaker:ListPipelineExecutions",
      "sagemaker:ListPipelineExecutionSteps"
    ],
    "Resource": "arn:aws:sagemaker:*:*:pipeline/{company_prefix}-{env}-{tenant_id}-*"
  },
  {
    "Sid": "PipelineAssetAccess",
    "Effect": "Allow",
    "Action": [
      "s3:GetObject",
      "s3:ListBucket"
    ],
    "Resource": [
      "arn:aws:s3:::{company_prefix}-{env}-{tenant_id}-*",
      "arn:aws:s3:::{company_prefix}-{env}-{tenant_id}-*/*"
    ]
  },
  {
    "Sid": "ConfigurationAndSecretsAccess",
    "Effect": "Allow",
    "Action": [
      "secretsmanager:GetSecretValue",
      "ssm:GetParameter",
      "ssm:GetParameters"
    ],
    "Resource": [
      "arn:aws:secretsmanager:*:*:secret:{company_prefix}-{env}-{tenant_id}-*",
      "arn:aws:ssm:*:*:parameter/{company_prefix}-{env}-{tenant_id}/*"
    ]
  },
  {
    "Sid": "ContainerRegistryAccess",
    "Effect": "Allow",
    "Action": [
      "ecr:GetAuthorizationToken",
      "ecr:BatchCheckLayerAvailability",
      "ecr:GetDownloadUrlForLayer",
      "ecr:BatchGetImage",
      "ecr:DescribeRepositories",
      "ecr:ListImages",
      "ecr:DescribeImages",
      "ecr:InitiateLayerUpload",
      "ecr:UploadLayerPart",
      "ecr:CompleteLayerUpload",
      "ecr:PutImage",
      "ecr:TagResource"
    ],
    "Resource": "*"
  },
  {
    "Sid": "PassRoleToSageMaker",
    "Effect": "Allow",
    "Action": "iam:PassRole",
    "Resource": "arn:aws:iam::*:role/{company_prefix}-{env}-{tenant_id}-role-*",
    "Condition": {
      "StringEquals": {
        "iam:PassedToService": "sagemaker.amazonaws.com"
      }
    }
  }
]

"Sid": "SageMakerPipelineManagement"

Grants the CI/CD runner full lifecycle management of SageMaker Pipelines (except deletion), scoped to the tenant’s project namespace.

Pipeline Definition Management

Policy Action

Description

sagemaker:CreatePipeline

Create a new SageMaker Pipeline definition from code when a pipeline doesn’t yet exist.

sagemaker:UpdatePipeline

Update an existing pipeline definition when code is merged — the primary action for iterative CI/CD deployments.

sagemaker:DescribePipeline

Retrieve metadata and configuration of a pipeline definition. Required for the runner to verify the current state before applying updates.

sagemaker:ListPipelines

Discover existing pipelines in the tenant namespace. Required for the runner to determine whether to create a new pipeline or update an existing one.

Pipeline Execution Management

Policy Action

Description

sagemaker:StartPipelineExecution

Trigger a pipeline run automatically after a successful build or code merge.

sagemaker:StopPipelineExecution

Halt a running execution if automated tests or failure conditions are detected.

sagemaker:DescribePipelineExecution

View the status, parameters, and details of a specific execution run. Required for the runner to report pass/fail status back to GitHub/Jenkins.

sagemaker:ListPipelineExecutions

View the history of all runs for a pipeline. Required for the runner to check if a previous execution is still running before starting a new one.

sagemaker:ListPipelineExecutionSteps

Examine individual steps within an execution. Required for the runner to report step-level status (e.g., “training step failed at epoch 5”) back to the CI/CD system.

"Sid": "PipelineAssetAccess"

Grants read-only access to S3 buckets within the tenant namespace for downloading pipeline definitions, code artifacts, and model artifacts.

Policy Action

Description

s3:GetObject

Download pipeline definition files, code packages, and model artifacts stored in S3.

s3:ListBucket

List objects within the tenant’s S3 buckets to verify that required assets exist before pipeline execution.

"Sid": "ConfigurationAndSecretsAccess"

Grants read-only access to configuration and secrets required for pipeline execution, scoped to the tenant namespace.

Policy Action

Description

secretsmanager:GetSecretValue

Retrieve sensitive data (API keys, database credentials, external service tokens) needed during pipeline execution.

ssm:GetParameter

Read a single configuration parameter (e.g., model hyperparameters, feature store endpoints).

ssm:GetParameters

Read multiple configuration parameters in a single call for efficient pipeline initialization.

"Sid": "ContainerRegistryAccess"

Enables the CI/CD runner to authenticate with ECR, pull base images, build new images, and push them to the registry for use in ML pipeline steps.

Authentication

Policy Action

Description

ecr:GetAuthorizationToken

Retrieve a temporary authentication token to authenticate the Docker CLI to the registry.

Read/Pull Actions

Policy Action

Description

ecr:BatchCheckLayerAvailability

Check if specific image layers already exist in the repository (used during both pull and push).

ecr:GetDownloadUrlForLayer

Retrieve a URL to download a specific image layer for pulling base images.

ecr:BatchGetImage

Retrieve image manifests for pulling base images used in pipeline steps.

ecr:DescribeRepositories

View repository metadata to verify target repositories exist before pushing.

ecr:ListImages

List images in a repository to check for existing tags and avoid unnecessary rebuilds.

ecr:DescribeImages

View image metadata (size, push date, tags) for build cache optimization.

Write/Push Actions

Policy Action

Description

ecr:InitiateLayerUpload

Start the multi-step process of uploading a new image layer.

ecr:UploadLayerPart

Upload a segment of an image layer during the push process.

ecr:CompleteLayerUpload

Finalize the layer upload, confirming all parts have been received.

ecr:PutImage

Push the image manifest to the repository, making the complete container image available for use in pipeline steps.

ecr:TagResource

Add or update metadata tags on repositories (e.g., build number, commit SHA).

"Sid": "PassRoleToSageMaker"

Permits the CI/CD runner to pass an execution role to SageMaker so that pipeline steps (training jobs, processing jobs, transform jobs) have the compute permissions they need.

Policy Action

Description

iam:PassRole

Assign a specific service role to the SageMaker Pipeline being created or executed. Conditioned to sagemaker.amazonaws.com only — the runner cannot pass roles to any other AWS service, limiting the blast radius of a compromised runner. Scoped to roles matching {company_prefix}-{env}-{tenant_id}-role-* to prevent passing arbitrary roles.

GitHub Actions Example:

- name: Configure AWS Credentials
  uses: aws-actions/configure-aws-credentials@v4
  with:
    role-to-assume: arn:aws:iam::123456789012:role/acme-dev-a001-role-ci-runner
    aws-region: us-west-2

- name: Login to ECR
  run: |
    aws ecr get-login-password --region us-west-2 | \
      docker login --username AWS --password-stdin \
      123456789012.dkr.ecr.us-west-2.amazonaws.com

- name: Build and Push Training Image
  run: |
    docker build -t acme-dev-a001-ml-training:${{ github.sha }} .
    docker tag acme-dev-a001-ml-training:${{ github.sha }} \
      123456789012.dkr.ecr.us-west-2.amazonaws.com/acme-dev-a001-ml-training:${{ github.sha }}
    docker push \
      123456789012.dkr.ecr.us-west-2.amazonaws.com/acme-dev-a001-ml-training:${{ github.sha }}

- name: Deploy SageMaker Pipeline
  run: |
    python deploy_pipeline.py \
      --pipeline-name acme-dev-a001-fraud-detection \
      --role-arn ${{ secrets.SAGEMAKER_ROLE_ARN }} \
      --image-uri 123456789012.dkr.ecr.us-west-2.amazonaws.com/acme-dev-a001-ml-training:${{ github.sha }}

- name: Start Pipeline Execution
  run: |
    aws sagemaker start-pipeline-execution \
      --pipeline-name acme-dev-a001-fraud-detection \
      --pipeline-parameters '[{"Name":"ImageUri","Value":"'$IMAGE_URI'"}]'

Compliance Note: In regulated industries, auditors can trace:

  • Human actions → IAM user CloudTrail logs (project-dev)

  • Automated actions → IAM role CloudTrail logs (project-ci)

This separation satisfies SOC2, HIPAA, and PCI-DSS requirements.


Level 4: project-full¶

⚠ Reference Pattern — Not Generated by sec-provisioner

This policy requires a specific project name in the resource ARN (e.g., fraud-detection, recommendation-engine). Since project names are not known at platform provisioning time, this policy is not generated by the sec-provisioner. It is documented here as a reference pattern to be applied during project onboarding when the project name is known.

Purpose: Full pipeline control for human team members working on a specific ML project

Principal: Human (project engineers and data scientists)

Typical Users:

  • ML engineers (project-focused)

  • Data scientists (running experiments)

  • Project teams (isolated access)

Assignment: Attached to project-specific IAM groups created during project onboarding

How It Differs from project-ci (Level 3):

  • Same resource scope — both are scoped to a single project’s pipelines

  • Different principal — project-ci is for automated CI/CD runners, project-full is for humans

  • Removes runner-specific actions — no iam:PassRole, no SSM/Secrets access, no S3/ECR asset access

  • Adds interactive debugging — ListPipelineExecutionSteps for step-level troubleshooting

  • Adds discovery — ListPipelines for humans to browse their project’s pipelines

  • Adds explicit Deny — DenyCriticalActions Sid blocks sagemaker:DeletePipeline as a safety net (runners don’t need this because they never have delete in their Allow)

What You Can Do:

  • ✅ Create and update pipelines for your project

  • ✅ Start and stop pipeline executions

  • ✅ View execution logs, metrics, and step-level details

  • ✅ List and discover your project’s pipelines

What You Cannot Do:

  • ❌ Access other teams’ pipelines

  • ❌ Modify shared/platform pipelines

  • ❌ Delete pipelines (explicit Deny)

  • ❌ Pass IAM roles or access secrets (those are runner concerns)

Example Scenario:

The fraud-detection team needs to run their training pipeline without accessing the recommendation-engine team’s pipelines. Engineers create, update, and monitor pipelines interactively — but deletion requires a platform admin.

Resource Scope:

All SageMaker resources are scoped to the project name within the tenant’s naming convention. Replace {project} with the actual project name at onboarding time.

arn:aws:sagemaker:*:*:pipeline/{company_prefix}-{env}-{project}-*

Example (for Edge Corp, prod environment, fraud-detection project):

arn:aws:sagemaker:*:*:pipeline/edge-prod-fraud-detection-*

Sample Permissions:

[
  {
    "Sid": "PipelineManagement",
    "Effect": "Allow",
    "Action": [
      "sagemaker:CreatePipeline",
      "sagemaker:UpdatePipeline"
    ],
    "Resource": "arn:aws:sagemaker:*:*:pipeline/edge-prod-fraud-detection-*"
  },
  {
    "Sid": "PipelineExecution",
    "Effect": "Allow",
    "Action": [
      "sagemaker:StartPipelineExecution",
      "sagemaker:StopPipelineExecution"
    ],
    "Resource": "arn:aws:sagemaker:*:*:pipeline/edge-prod-fraud-detection-*"
  },
  {
    "Sid": "MonitoringAndVisibility",
    "Effect": "Allow",
    "Action": [
      "sagemaker:DescribePipeline",
      "sagemaker:DescribePipelineExecution",
      "sagemaker:ListPipelines",
      "sagemaker:ListPipelineExecutions",
      "sagemaker:ListPipelineExecutionSteps"
    ],
    "Resource": "arn:aws:sagemaker:*:*:pipeline/edge-prod-fraud-detection-*"
  },
  {
    "Sid": "PipelineLogsAccess",
    "Effect": "Allow",
    "Action": [
      "logs:GetLogEvents",
      "logs:DescribeLogStreams"
    ],
    "Resource": "*"
  },
  {
    "Sid": "DenyCriticalActions",
    "Effect": "Deny",
    "Action": [
      "sagemaker:DeletePipeline"
    ],
    "Resource": "*"
  }
]

"Sid": "PipelineManagement"

Allows users to build and modify project-specific ML workflows. Restricted via the Resource ARN to only pipelines matching the project’s naming convention, preventing interference with other teams.

Policy Action

Description

sagemaker:CreatePipeline

Define a new sequence of ML steps (data prep, training, etc.) for this project.

sagemaker:UpdatePipeline

Modify existing pipeline definitions as project requirements evolve.

"Sid": "PipelineExecution"

Grants operational control to run or halt project experiments. Ensures data scientists can iterate on models without requiring admin intervention.

Policy Action

Description

sagemaker:StartPipelineExecution

Triggers a new run of the ML pipeline using specified data or parameters.

sagemaker:StopPipelineExecution

Allows engineers to manually kill a run if errors are detected, saving compute costs.

"Sid": "MonitoringAndVisibility"

Provides read access for interactive debugging and troubleshooting. Unlike project-ci (which only needs execution-level status), humans need step-level detail and pipeline discovery to work effectively.

Policy Action

Description

sagemaker:DescribePipeline

Retrieves pipeline metadata including ARN, name, creation time, status, and associated IAM identity.

sagemaker:DescribePipelineExecution

Returns details about a specific execution such as ARN, status, creation time, and failure reasons.

sagemaker:ListPipelines

Discover all pipelines within the project scope. Humans need this to browse and select pipelines interactively.

sagemaker:ListPipelineExecutions

View the history of all runs for a pipeline. Lists execution summaries for troubleshooting and tracking.

sagemaker:ListPipelineExecutionSteps

Inspect individual steps within an execution. Essential for humans debugging which step failed and why.

"Sid": "PipelineLogsAccess"

Separated from MonitoringAndVisibility because CloudWatch log group names are generated by AWS at runtime and cannot be scoped to a project prefix. Uses Resource: * out of necessity, not by choice.

Policy Action

Description

logs:GetLogEvents

Retrieves log events from a CloudWatch Logs log stream, allowing filtering by time range.

logs:DescribeLogStreams

Lists log streams within a log group, with options to filter by prefix or order by last event time.

"Sid": "DenyCriticalActions"

Explicit safety net to prevent accidental or unauthorized deletion. An explicit Deny always overrides an Allow in IAM, ensuring that no other policy — including any future policy changes — can grant deletion rights to this group.

Policy Action

Description

sagemaker:DeletePipeline

Specifically blocked to ensure that even project members cannot permanently remove pipeline infrastructure. Deletion is reserved for Level 5: platform-full.


Level 5: platform-full¶

Purpose: Platform-wide pipeline management across all projects and tenants

Principal: Human (platform administrators)

Typical Users:

  • MLOps platform team

  • Pipeline infrastructure owners

  • Cross-project coordinators

Assignment: Platform admin IAM groups (e.g., {company_prefix}-{env}-group-platform-admins)

How It Differs from project-full (Level 4):

  • Scope breaks out from project to account-wide — Resource: * instead of project-scoped ARNs

  • Adds delete — sagemaker:DeletePipeline and codepipeline:DeletePipeline (only level that can delete)

  • Adds CodePipeline management — Levels 1-4 focus on SageMaker Pipelines; Level 5 adds full CI/CD pipeline control

  • Adds governance actions — approval gates, stage transitions, pipeline freezing

  • Adds PassRole — can assign IAM roles to pipelines (scoped to SageMaker and CodePipeline services)

  • No explicit Deny — this is the level where delete is intentionally allowed

What You Can Do:

  • ✅ Manage all SageMaker pipelines across all projects and tenants

  • ✅ Manage all CodePipeline CI/CD pipelines across the platform

  • ✅ Create shared/platform pipelines

  • ✅ Delete obsolete pipelines (SageMaker and CodePipeline)

  • ✅ Approve/reject deployment gates

  • ✅ Freeze and unfreeze pipeline stages

  • ✅ Assign IAM roles to pipelines

What You Cannot Do:

  • ❌ Nothing — full pipeline access across both SageMaker and CodePipeline

Example Scenario:

The MLOps team maintains a shared data preprocessing pipeline used by all ML projects and needs to update it with new validation steps. They also need to decommission a retired project’s pipelines and approve a production deployment gate.

Resource Scope:

Account-wide — no tenant or project scoping. Platform admins need cross-cutting access to manage the entire pipeline infrastructure.

Resource: "*"

Sample Permissions:

[
  {
    "Sid": "SageMakerPipelineFullAccess",
    "Effect": "Allow",
    "Action": [
      "sagemaker:CreatePipeline",
      "sagemaker:UpdatePipeline",
      "sagemaker:DeletePipeline",
      "sagemaker:DescribePipeline",
      "sagemaker:ListPipelines",
      "sagemaker:StartPipelineExecution",
      "sagemaker:StopPipelineExecution",
      "sagemaker:DescribePipelineExecution",
      "sagemaker:ListPipelineExecutions",
      "sagemaker:ListPipelineExecutionSteps"
    ],
    "Resource": "*"
  },
  {
    "Sid": "CodePipelineFullAccess",
    "Effect": "Allow",
    "Action": [
      "codepipeline:CreatePipeline",
      "codepipeline:UpdatePipeline",
      "codepipeline:DeletePipeline",
      "codepipeline:GetPipeline",
      "codepipeline:ListPipelines",
      "codepipeline:GetPipelineState",
      "codepipeline:GetPipelineExecution",
      "codepipeline:StartPipelineExecution",
      "codepipeline:StopPipelineExecution",
      "codepipeline:RetryStageExecution",
      "codepipeline:RollbackStage"
    ],
    "Resource": "*"
  },
  {
    "Sid": "PipelineGovernance",
    "Effect": "Allow",
    "Action": [
      "codepipeline:PutApprovalResult",
      "codepipeline:DisableStageTransition",
      "codepipeline:EnableStageTransition"
    ],
    "Resource": "*"
  },
  {
    "Sid": "PipelineLogsFullAccess",
    "Effect": "Allow",
    "Action": [
      "logs:GetLogEvents",
      "logs:DescribeLogStreams",
      "logs:DescribeLogGroups",
      "logs:FilterLogEvents"
    ],
    "Resource": "*"
  },
  {
    "Sid": "PassRoleToPipelineServices",
    "Effect": "Allow",
    "Action": [
      "iam:PassRole"
    ],
    "Resource": "arn:aws:iam::*:role/{company_prefix}-{env}-*-role-*",
    "Condition": {
      "StringEquals": {
        "iam:PassedToService": [
          "sagemaker.amazonaws.com",
          "codepipeline.amazonaws.com"
        ]
      }
    }
  }
]

"Sid": "SageMakerPipelineFullAccess"

Full control over all SageMaker Pipelines across every project and tenant. This is the only level that includes sagemaker:DeletePipeline — all lower levels either omit it or explicitly deny it. Platform admins use this to manage the complete lifecycle of ML pipelines including decommissioning retired projects.

Policy Action

Description

sagemaker:CreatePipeline

Create new SageMaker pipeline definitions for any project or shared infrastructure.

sagemaker:UpdatePipeline

Modify any existing pipeline definition across the platform.

sagemaker:DeletePipeline

Permanently remove obsolete or retired pipelines. Only available at this level.

sagemaker:DescribePipeline

Retrieve metadata for any pipeline including ARN, status, and associated IAM identity.

sagemaker:ListPipelines

Discover all pipelines across the entire account for cross-project visibility.

sagemaker:StartPipelineExecution

Trigger execution of any pipeline for cross-project coordination or incident response.

sagemaker:StopPipelineExecution

Halt any running pipeline execution across the platform.

sagemaker:DescribePipelineExecution

Inspect execution details including status, timing, and failure reasons for any pipeline.

sagemaker:ListPipelineExecutions

View execution history across all pipelines for platform-wide monitoring.

sagemaker:ListPipelineExecutionSteps

Inspect step-level details within any execution for deep troubleshooting.

"Sid": "CodePipelineFullAccess"

Full control over all CodePipeline CI/CD pipelines. This extends platform-full beyond SageMaker into the CI/CD layer, giving the MLOps team end-to-end pipeline management from source code through to model deployment.

Policy Action

Description

codepipeline:CreatePipeline

Create new CI/CD pipelines for any project or shared infrastructure.

codepipeline:UpdatePipeline

Modify the structure or settings of any existing pipeline.

codepipeline:DeletePipeline

Permanently remove obsolete CI/CD pipeline configurations.

codepipeline:GetPipeline

View the JSON structure and configuration of any pipeline.

codepipeline:ListPipelines

List all CI/CD pipelines in the account for platform-wide visibility.

codepipeline:GetPipelineState

Real-time view of stage and action status (Succeeded, In Progress, Failed).

codepipeline:GetPipelineExecution

View details and history of a specific execution run.

codepipeline:StartPipelineExecution

Manually trigger any pipeline for cross-project coordination.

codepipeline:StopPipelineExecution

Force-stop a running pipeline mid-process.

codepipeline:RetryStageExecution

Restart a failed stage without rerunning the entire pipeline.

codepipeline:RollbackStage

Revert a stage to a previous successful state for incident recovery.

"Sid": "PipelineGovernance"

Governance actions for deployment control. Allows platform admins to approve or reject deployment gates, freeze pipeline stages during incidents, and resume flow when resolved. Separated from CodePipelineFullAccess because these are administrative/governance actions, not pipeline CRUD.

Policy Action

Description

codepipeline:PutApprovalResult

Approve or reject a manual approval gate to move a deployment forward.

codepipeline:DisableStageTransition

Freeze a pipeline stage to prevent progression (e.g., during an incident or change freeze).

codepipeline:EnableStageTransition

Re-enable flow between stages after a freeze is lifted.

"Sid": "PipelineLogsFullAccess"

Full CloudWatch Logs access for platform-wide pipeline troubleshooting. Adds DescribeLogGroups and FilterLogEvents beyond what lower levels have — platform admins need to discover log groups across all projects and search across log streams.

Policy Action

Description

logs:GetLogEvents

Retrieve log events from any pipeline’s log stream.

logs:DescribeLogStreams

List log streams within any log group for cross-project investigation.

logs:DescribeLogGroups

Discover all log groups across the account — needed for platform-wide visibility.

logs:FilterLogEvents

Search across log streams within a log group — essential for incident investigation across projects.

"Sid": "PassRoleToPipelineServices"

Allows platform admins to assign IAM roles to both SageMaker and CodePipeline services. Scoped to roles matching the platform’s naming convention and conditioned to only pass roles to pipeline services — prevents using this permission to escalate privileges to other AWS services.

Policy Action

Description

iam:PassRole

Assign IAM service roles to pipelines. Scoped to {company_prefix}-{env}-*-role-* and conditioned to sagemaker.amazonaws.com and codepipeline.amazonaws.com only.


Inference Policies¶

Inference policies control access to deployed ML models and prediction services. Unlike S3, ECR, or Pipeline policies which each target a single AWS service, inference spans multiple services — each with its own permission model and use cases.

Service

Use Case

Levels

SageMaker Inference

Real-time endpoints, batch transform, async/serverless inference, autoscaling

4

Lambda Inference

Lightweight model serving, custom inference containers, event-driven predictions

3

Bedrock Inference

Foundation model invocation, cross-region inference, provisioned throughput

3

Each service gets the number of levels its permission model actually needs — no artificial uniformity.


SageMaker Inference¶

SageMaker Inference policies control access to deployed ML models and endpoints. The level progression is invoke-centric: who can call the model, and in which environment.

Level 1: read-only¶

Purpose: This IAM policy provides read-only access for monitoring the health of SageMaker endpoints without granting permissions to invoke predictions (no inference costs)

Principal: Human (auditors, monitoring teams)

Typical Users:

  • Compliance auditors

  • QA teams

  • Monitoring dashboards

  • Cost optimization analysts

  • New team members learning the platform

What You Can Do:

  • ✅ View endpoint status and health

  • ✅ List all endpoints and configurations

  • ✅ See endpoint metadata (instance type, model version)

  • ✅ Monitor endpoint metrics (latency, error rates)

  • ✅ Check autoscaling settings

What You Cannot Do:

  • ❌ Invoke endpoints (send prediction requests)

  • ❌ Create or modify endpoints

  • ❌ Delete endpoints

Example Scenario:

Sarah is a QA engineer who needs to verify that all production endpoints are using the approved instance types and have autoscaling enabled. She needs to see endpoint configurations but doesn’t need to send prediction requests.

Sample Permissions:

[
    {
        "Sid": "SageMakerEndpointReadOnly",
        "Effect": "Allow",
        "Action": [
            "sagemaker:ListEndpoints",
            "sagemaker:DescribeEndpoint",
            "sagemaker:ListEndpointConfigs",
            "sagemaker:DescribeEndpointConfig",
            "sagemaker:ListModels",
            "sagemaker:DescribeModel",
            "sagemaker:DescribeModelPackage",
            "sagemaker:ListModelPackages"
        ],
        "Resource": "*"
    },
    {
        "Sid": "CloudWatchMetricsReadOnly",
        "Effect": "Allow",
        "Action": [
            "cloudwatch:GetMetricData",
            "cloudwatch:GetMetricStatistics",
            "cloudwatch:ListMetrics"
        ],
        "Resource": "*"
    },
    {
        "Sid": "AutoScalingReadOnly",
        "Effect": "Allow",
        "Action": [
            "application-autoscaling:DescribeScalableTargets",
            "application-autoscaling:DescribeScalingPolicies"
        ],
        "Resource": "*"
    },
      {
        "Sid": "ExplicitDenyInference",
        "Effect": "Deny",
        "Action": [
            "sagemaker:InvokeEndpoint",
            "sagemaker:InvokeEndpointAsync"
        ],
        "Resource": "*"
    }      
]

"Sid": "SageMakerEndpointReadOnly"

Grants view-only access to SageMaker hosting resources, allowing the user to see lists and configurations for models and endpoints.

Policy Action

Description

sagemaker:ListEndpoints

Returns a list of all existing endpoints in your account, including their names and ARNs.

sagemaker:DescribeEndpoint

Returns detailed information about a specific endpoint (e.g., status, configuration name).

sagemaker:ListEndpointConfigs

Lists all endpoint configurations (the blueprints for endpoints).

sagemaker:DescribeEndpointConfig

Returns details of an endpoint configuration, such as instance types and model names.

sagemaker:ListModels

Lists all models currently created in SageMaker.

sagemaker:DescribeModel

Returns details about a model, including the container image and execution role.

sagemaker:DescribeModelPackage

Provides information about a specific versioned model package.

sagemaker:ListModelPackages

Lists all available model packages or groups.

"Sid": "CloudWatchMetricsReadOnly"

Provides access to view performance data and statistics for monitoring the health and usage of the resources.

Policy Action

Description

cloudwatch:GetMetricData

Retrieves raw data points for various metrics across multiple resources.

cloudwatch:GetMetricStatistics

Gets specific statistical data (Average, Sum, Max, etc.) for a metric.

cloudwatch:ListMetrics

Lists all the valid metric names available to be viewed.

"Sid": "AutoScalingReadOnly"

Allows the user to see the scaling configurations and policies applied to the endpoints.

Policy Action

Description

application-autoscaling:DescribeScalableTargets

Shows which resources (endpoints) are set up to scale automatically.

application-autoscaling:DescribeScalingPolicies

Shows the specific rules that trigger a scale-up or scale-down event.

"Sid": "ExplicitDenyInference"

Specifically blocks the ability to actually run data through an endpoint for predictions, ensuring the policy remains “read-only.”

Policy Action

Description

sagemaker:InvokeEndpoint

(Denied) The action required to send a synchronous request to an endpoint for a prediction.

sagemaker:InvokeEndpointAsync

(Denied) The action required to send an asynchronous request for long-running inferences.

Cost Benefit: No InvokeEndpoint permission means no inference charges - perfect for monitoring and audit use cases.


Level 1-prod: read-only-invoke¶

Purpose: Read-only monitoring access with production invoke permissions. Designed for non-technical users who consume model predictions through dashboards and applications.

Principal: Human (business consumers, product managers, analysts)

Typical Users:

  • Business consumers using ML-powered dashboards

  • Product managers validating model outputs

  • Analysts running predictions for business decisions

  • Applications calling endpoints on behalf of business users

How It Differs from read-only (Level 1):

  • Adds invoke — can send prediction requests to endpoints in all environments including production

  • Same read-only baseline — identical list/describe/monitor permissions

  • No endpoint lifecycle — cannot create, modify, or delete endpoints

What You Can Do:

  • ✅ Everything in read-only (Level 1), PLUS:

  • ✅ Invoke sandbox, dev, staging, and production endpoints

  • ✅ Send real-time and async prediction requests

  • ✅ Consume ML models through applications and dashboards

What You Cannot Do:

  • ❌ Create, modify, or delete endpoints or endpoint configs

  • ❌ Register or manage models

  • ❌ Access training jobs or notebooks

Example Scenario:

Lisa is a product manager who uses an internal dashboard powered by a fraud detection model. The dashboard calls the production SageMaker endpoint to score transactions in real-time. Lisa needs invoke access to production but should never modify the endpoint or model behind it.

Sample Permissions:

[
    {
        "Sid": "SageMakerEndpointReadOnly",
        "Effect": "Allow",
        "Action": [
            "sagemaker:ListEndpoints",
            "sagemaker:DescribeEndpoint",
            "sagemaker:ListEndpointConfigs",
            "sagemaker:DescribeEndpointConfig",
            "sagemaker:ListModels",
            "sagemaker:DescribeModel",
            "sagemaker:DescribeModelPackage",
            "sagemaker:ListModelPackages"
        ],
        "Resource": "*"
    },
    {
        "Sid": "CloudWatchMetricsReadOnly",
        "Effect": "Allow",
        "Action": [
            "cloudwatch:GetMetricData",
            "cloudwatch:GetMetricStatistics",
            "cloudwatch:ListMetrics"
        ],
        "Resource": "*"
    },
    {
        "Sid": "AutoScalingReadOnly",
        "Effect": "Allow",
        "Action": [
            "application-autoscaling:DescribeScalableTargets",
            "application-autoscaling:DescribeScalingPolicies"
        ],
        "Resource": "*"
    },
    {
        "Sid": "InvokeAllEnvironments",
        "Effect": "Allow",
        "Action": [
            "sagemaker:InvokeEndpoint",
            "sagemaker:InvokeEndpointAsync"
        ],
        "Resource": "arn:aws:sagemaker:*:*:endpoint/*"
    }
]

"Sid": "SageMakerEndpointReadOnly"

Identical to Level 1. Grants view-only access to SageMaker hosting resources.

"Sid": "CloudWatchMetricsReadOnly"

Identical to Level 1. Provides access to view performance data and statistics.

"Sid": "AutoScalingReadOnly"

Identical to Level 1. Allows the user to see scaling configurations.

"Sid": "InvokeAllEnvironments"

Allows sending prediction requests to endpoints in all environments (sandbox, dev, staging, production). Scoped to endpoint/* — no access to endpoint configs, models, or training resources.

Policy Action

Description

sagemaker:InvokeEndpoint

Sends a real-time inference request to any running endpoint to get a prediction.

sagemaker:InvokeEndpointAsync

Sends an inference request to any asynchronous endpoint (used for large payloads or long processing times).

Key Difference from Level 1: No ExplicitDenyInference Sid. Instead, invoke is explicitly allowed across all environments. The read-only baseline remains identical.

Cost Consideration: Unlike Level 1, this level incurs inference costs per invocation. Use API throttling and service quotas to manage cost risk rather than IAM restrictions.


Level 2: dev-invoke¶

Purpose: Test models in development and staging environments

Principal: Human (data scientists, ML engineers)

Typical Users:

  • Data scientists (A/B testing)

  • ML engineers (model validation)

  • QA teams (integration testing)

  • Development applications

How It Differs from read-only (Level 1):

  • Adds invoke — can send prediction requests to dev/staging endpoints

  • Adds model registration — can register trained models in SageMaker Model Registry

  • Environment-scoped — restricted to {prefix}-dev-* and {prefix}-staging-* endpoints

  • Sandbox/dev/staging endpoint lifecycle — can create, modify, and delete endpoints in non-production environments

  • Production endpoints blocked — explicit deny on *-prod-* endpoints and endpoint configs

What You Can Do:

  • ✅ Everything in read-only, PLUS:

  • ✅ Invoke dev endpoints for testing

  • ✅ Invoke staging endpoints for validation

  • ✅ Send test prediction requests

  • ✅ Validate model responses

  • ✅ Register trained models in SageMaker Model Registry

  • ✅ Create model package groups for organizing model versions

  • ✅ Create model definitions (link artifacts + container)

  • ✅ Create, update, and delete sandbox/dev/staging endpoints

  • ✅ Create endpoint configs for non-production environments

What You Cannot Do:

  • ❌ Invoke production endpoints

  • ❌ Create, modify, or delete production (*-prod-*) endpoints

  • ❌ Create production endpoint configs

  • ❌ Approve or reject model packages (governance responsibility)

Example Scenario:

Marcus is a data scientist who deployed two fraud detection models to the staging environment. He needs to send test transactions to both endpoints to compare their accuracy before promoting the winner to production.

Sample Permissions:

[
  {
    "Sid": "SageMakerReadOnlyAccess",
    "Effect": "Allow",
    "Action": [
      "sagemaker:ListEndpoints",
      "sagemaker:DescribeEndpoint",
      "sagemaker:ListEndpointConfigs",
      "sagemaker:DescribeEndpointConfig",
      "sagemaker:ListModels",
      "sagemaker:DescribeModel",
      "sagemaker:DescribeModelPackage",
      "sagemaker:ListModelPackages"
      "sagemaker:GetSearchSuggestions"
    ],
    "Resource": "*"
  },
  {
    "Sid": "SageMakerInvokeDevStagingEndpoints",
    "Effect": "Allow",
    "Action": [
      "sagemaker:InvokeEndpoint",
      "sagemaker:InvokeEndpointAsync"
    ],
    "Resource": [
      "arn:aws:sagemaker:*:*:endpoint/*-sandbox-*",
      "arn:aws:sagemaker:*:*:endpoint/*-dev-*",
      "arn:aws:sagemaker:*:*:endpoint/*-staging-*"
    ]
  },
  {
    "Sid": "EndpointLifecycleNonProd",
    "Effect": "Allow",
    "Action": [
      "sagemaker:CreateEndpoint",
      "sagemaker:CreateEndpointConfig",
      "sagemaker:UpdateEndpoint",
      "sagemaker:DeleteEndpoint",
      "sagemaker:DeleteEndpointConfig"
    ],
    "Resource": [
      "arn:aws:sagemaker:*:*:endpoint/*-sandbox-*",
      "arn:aws:sagemaker:*:*:endpoint/*-dev-*",
      "arn:aws:sagemaker:*:*:endpoint/*-staging-*",
      "arn:aws:sagemaker:*:*:endpoint-config/*-sandbox-*",
      "arn:aws:sagemaker:*:*:endpoint-config/*-dev-*",
      "arn:aws:sagemaker:*:*:endpoint-config/*-staging-*"
    ]
  },
  {
    "Sid": "ModelRegistration",
    "Effect": "Allow",
    "Action": [
      "sagemaker:CreateModel",
      "sagemaker:CreateModelPackage",
      "sagemaker:CreateModelPackageGroup"
    ],
    "Resource": "*"
  },
  {
    "Sid": "ExplicitDenyProductionAndLifecycle",
    "Effect": "Deny",
    "Action": [
      "sagemaker:CreateEndpoint",
      "sagemaker:CreateEndpointConfig",
      "sagemaker:UpdateEndpoint",
      "sagemaker:DeleteEndpoint",
      "sagemaker:DeleteEndpointConfig"
    ],
    "Resource": [
      "arn:aws:sagemaker:*:*:endpoint/*-prod-*",
      "arn:aws:sagemaker:*:*:endpoint-config/*-prod-*"
    ]
  }
]

"Sid": "SageMakerReadOnlyAccess"

Grants broad permission to view and list SageMaker resources and search suggestions across the entire account.

Policy Action

Description

sagemaker:Describe

Retrieves detailed information about a resource (e.g., training jobs, models, or endpoints).

sagemaker:List*

Lists resources of a specific type to see what exists in the environment.

sagemaker:GetSearchSuggestions

Provides auto-complete suggestions for SageMaker search queries.

"Sid": "SageMakerInvokeDevStagingEndpoints"

Allows the user to send data to specific SageMaker endpoints named as “sandbox”, “dev”, or “staging.”

Policy Action

Description

sagemaker:InvokeEndpoint

Sends a real-time inference request to a running endpoint to get a prediction.

sagemaker:InvokeEndpointAsync

Sends an inference request to an asynchronous endpoint (used for large payloads or long processing times).

"Sid": "EndpointLifecycleNonProd"

Allows creating, updating, and deleting endpoints and endpoint configurations in non-production environments (sandbox, dev, staging). This enables ML engineers and data scientists to test real-time inference latency, validate inference logic, and iterate on endpoint configurations before handing off to MLOps for production deployment.

Policy Action

Description

sagemaker:CreateEndpoint

Creates a new endpoint using a specific endpoint configuration. Scoped to sandbox/dev/staging naming patterns.

sagemaker:CreateEndpointConfig

Defines the hardware (instance type, count) and model specifications for an endpoint. Scoped to non-production.

sagemaker:UpdateEndpoint

Deploys a new model or configuration to an existing non-production endpoint.

sagemaker:DeleteEndpoint

Shuts down and removes a non-production endpoint to stop incurring costs.

sagemaker:DeleteEndpointConfig

Removes an endpoint configuration that is no longer needed in non-production.

"Sid": "ModelRegistration"

Allows data scientists to register trained models in the SageMaker Model Registry after training. This is a development activity — the model sits in the registry awaiting approval from MLOps or governance teams before production deployment.

Policy Action

Description

sagemaker:CreateModel

Creates a model definition in SageMaker by specifying the Docker container image, model artifacts (from S3), and inference code. This does not deploy the model — it only defines it.

sagemaker:CreateModelPackage

Registers a versioned model package in the Model Registry. This is the primary action for submitting a trained model for review and approval.

sagemaker:CreateModelPackageGroup

Creates a model package group to organize related model versions (e.g., all versions of a fraud detection model). Typically done once per model project.

"Sid": "ExplicitDenyProductionAndLifecycle"

A strict guardrail that blocks any interaction with production (*-prod-*) endpoints and their endpoint configurations. Non-production environments (sandbox, dev, staging) are allowed.

Policy Action Denied

Description

sagemaker:CreateEndpoint

(Denied) Creates a new endpoint using a specific endpoint configuration.

sagemaker:CreateEndpointConfig

(Denied) Defines the hardware and model specifications for an endpoint.

sagemaker:UpdateEndpoint

(Denied) Deploys a new model or configuration to an existing endpoint.

sagemaker:DeleteEndpoint

(Denied) Shuts down and removes a production endpoint to stop incurring costs.

sagemaker:DeleteEndpointConfig

(Denied) Removes a production endpoint configuration.

Python Example:

import boto3
import json

runtime = boto3.client('sagemaker-runtime')

# Test staging endpoint
response = runtime.invoke_endpoint(
    EndpointName='acme-dev-fraud-detection-v2',
    ContentType='application/json',
    Body=json.dumps({
        'transaction_amount': 1500.00,
        'merchant_category': 'electronics'
    })
)

prediction = json.loads(response['Body'].read())
print(f"Fraud probability: {prediction['fraud_score']}")

Safety Feature: Cannot accidentally invoke production endpoints during testing - prevents costly mistakes and data contamination.


Level 3: prod-invoke¶

Purpose: Production applications invoking production models

Principal: Machine (backend services, APIs) or Human (production support)

Typical Users:

  • Backend API services

  • Production web applications

  • Mobile app backends

  • Real-time fraud detection systems

  • Customer-facing chatbots

How It Differs from dev-invoke (Level 2):

  • Switches environment scope — production endpoints only, no dev/staging

  • Typically assigned to service roles — production apps use IAM roles, not user credentials

  • Higher accountability — every invocation serves real customers

What You Can Do:

  • ✅ Everything in read-only, PLUS:

  • ✅ Invoke production endpoints only

  • ✅ Send real customer prediction requests

  • ✅ Receive model responses for business logic

What You Cannot Do:

  • ❌ Invoke dev or staging endpoints

  • ❌ Create or modify endpoints

  • ❌ Delete endpoints

Example Scenario:

The fraud detection API service receives transaction requests from the payment gateway. For each transaction, it calls the production fraud model endpoint and blocks transactions with fraud scores above 0.85.

Sample Permissions:

[
  {
    "Sid": "AllowProductionEndpointDiscovery",
    "Effect": "Allow",
    "Action": [
      "sagemaker:DescribeEndpoint",
      "sagemaker:ListEndpoints"
    ],
    "Resource": "*"
  },
  {
    "Sid": "AllowProductionModelInvocation",
    "Effect": "Allow",
    "Action": [
      "sagemaker:InvokeEndpoint",
      "sagemaker:InvokeEndpointAsync"
    ],
    "Resource": "arn:aws:sagemaker:*:*:endpoint/*-prod-*"
  },
  {
    "Sid": "DenyNonProductionInvocation",
    "Effect": "Deny",
    "Action": [
      "sagemaker:InvokeEndpoint",
      "sagemaker:InvokeEndpointAsync"
    ],
    "NotResource": "arn:aws:sagemaker:*:*:endpoint/*-prod-*"
  },
  {
    "Sid": "DenyEndpointModifications",
    "Effect": "Deny",
    "Action": [
      "sagemaker:CreateEndpoint",
      "sagemaker:UpdateEndpoint",
      "sagemaker:DeleteEndpoint",
      "sagemaker:CreateEndpointConfig",
      "sagemaker:DeleteEndpointConfig"
    ],
    "Resource": "*"
  }
]

"Sid": "AllowProductionEndpointDiscovery"

Enables the principal to find and view the status of endpoints that follow the -prod- naming convention.

Policy Action

Description

sagemaker:DescribeEndpoint

Returns detailed information about an endpoint, such as its current status and configuration name.

sagemaker:ListEndpoints

Lists the SageMaker endpoints in the account, allowing the user to see what is available.

"Sid": "AllowProductionModelInvocation"

Grants the core permission to send data to and receive predictions from production-ready models.

Policy Action

Description

sagemaker:InvokeEndpoint

Sends a synchronous request to an endpoint for low-latency, real-time machine learning inferences.

sagemaker:InvokeEndpointAsync

Sends an inference request to an asynchronous endpoint, suitable for large payloads or long processing times.

"Sid": "DenyNonProductionInvocation"

A guardrail that explicitly prevents this principal from hitting any endpoint not explicitly tagged or named as “prod,” preventing accidental cross-environment leakage.

Policy Action Denied

Description

sagemaker:InvokeEndpoint

(Denied) Sends a synchronous request to an endpoint for low-latency, real-time machine learning inferences.

sagemaker:InvokeEndpointAsync

(Denied) Sends an inference request to an asynchronous endpoint, suitable for large payloads or long processing times.

"Sid": "DenyEndpointModifications"

Ensures the principal cannot change the infrastructure, such as deleting models or scaling configs, maintaining environment stability.

Policy Action Denied

Description

sagemaker:CreateEndpoint

(Denied) Creates a new SageMaker endpoint using a specific endpoint configuration.

sagemaker:UpdateEndpoint

(Denied) Deploys a new endpoint configuration to an existing endpoint without taking it offline.

sagemaker:DeleteEndpoint

(Denied) Permanently removes an existing SageMaker endpoint and stops the associated hosting instances.

sagemaker:CreateEndpointConfig

(Denied) Defines a setup for an endpoint, specifying which models to deploy and the hardware instance types to use.

sagemaker:DeleteEndpointConfig

(Denied) Deletes a previously created endpoint configuration.

API Gateway Integration:

import boto3
import json
from flask import Flask, request, jsonify

app = Flask(__name__)
runtime = boto3.client('sagemaker-runtime')

@app.route('/check-fraud', methods=['POST'])
def check_fraud():
    transaction = request.json
    
    # Call production endpoint
    response = runtime.invoke_endpoint(
        EndpointName='acme-prod-fraud-detection',
        ContentType='application/json',
        Body=json.dumps(transaction)
    )
    
    prediction = json.loads(response['Body'].read())
    
    return jsonify({
        'transaction_id': transaction['id'],
        'fraud_score': prediction['fraud_score'],
        'action': 'block' if prediction['fraud_score'] > 0.85 else 'approve'
    })

Security Benefit: Production applications cannot call unstable dev/staging endpoints - ensures reliability and data integrity.


Level 4: full¶

Purpose: Complete endpoint lifecycle management across all environments

Principal: Human (MLOps engineers, platform admins)

Typical Users:

  • MLOps engineers

  • Platform administrators

  • Deployment automation (CI/CD)

  • Infrastructure team

How It Differs from prod-invoke (Level 3):

  • Adds lifecycle management — create, update, configure, and delete endpoints

  • Adds autoscaling control — configure scaling policies and instance counts

  • Account-wide scope — Resource: * across all environments

  • Includes delete — can decommission obsolete endpoints

What You Can Do:

  • ✅ Everything in prod-invoke, PLUS:

  • ✅ Create new endpoints

  • ✅ Update endpoint configurations

  • ✅ Deploy new model versions

  • ✅ Configure autoscaling policies

  • ✅ Delete obsolete endpoints

  • ✅ Manage endpoint tags

What You Cannot Do:

  • ❌ Nothing - this is full endpoint management

Example Scenario:

The MLOps team needs to deploy a new fraud detection model to production. They create an endpoint configuration with the new model, create the endpoint with 2 instances, enable autoscaling, and gradually shift traffic from the old endpoint using blue/green deployment.

Sample Permissions:

[
  {
    "Sid": "SageMakerEndpointLifecycleManagement",
    "Effect": "Allow",
    "Action": [
      "sagemaker:CreateEndpoint",
      "sagemaker:CreateEndpointConfig",
      "sagemaker:UpdateEndpoint",
      "sagemaker:UpdateEndpointWeightsAndCapacities",
      "sagemaker:DeleteEndpoint",
      "sagemaker:DeleteEndpointConfig",
      "sagemaker:DescribeEndpoint",
      "sagemaker:DescribeEndpointConfig",
      "sagemaker:ListEndpoints",
      "sagemaker:ListEndpointConfigs"
    ],
    "Resource": "*"
  },
  {
    "Sid": "ModelManagement",
    "Effect": "Allow",
    "Action": [
      "sagemaker:CreateModel",
      "sagemaker:DescribeModel",
      "sagemaker:DeleteModel",
      "sagemaker:ListModels"
    ],
    "Resource": "*"
  },
  {
    "Sid": "InferenceExecution",
    "Effect": "Allow",
    "Action": [
      "sagemaker:InvokeEndpoint",
      "sagemaker:InvokeEndpointAsync"
    ],
    "Resource": "*"
  },
  {
    "Sid": "AutoscalingAndMonitoring",
    "Effect": "Allow",
    "Action": [
      "application-autoscaling:RegisterScalableTarget",
      "application-autoscaling:DeregisterScalableTarget",
      "application-autoscaling:PutScalingPolicy",
      "application-autoscaling:DeleteScalingPolicy",
      "application-autoscaling:DescribeScalableTargets",
      "application-autoscaling:DescribeScalingPolicies",
      "cloudwatch:PutMetricAlarm",
      "cloudwatch:DescribeAlarms",
      "cloudwatch:DeleteAlarms"
    ],
    "Resource": "*"
  },
  {
    "Sid": "TaggingAndMetadata",
    "Effect": "Allow",
    "Action": [
      "sagemaker:AddTags",
      "sagemaker:DeleteTags",
      "sagemaker:ListTags"
    ],
    "Resource": "*"
  },
  {
    "Sid": "PassRoleToSageMaker",
    "Effect": "Allow",
    "Action": [
      "iam:PassRole"
    ],
    "Resource": "arn:aws:iam::*:role/{company_prefix}-{env}-*-role-*",
    "Condition": {
      "StringEquals": {
        "iam:PassedToService": "sagemaker.amazonaws.com"
      }
    }
  },
  {
    "Sid": "ExplicitDenyDataDeletion",
    "Effect": "Deny",
    "Action": [
      "sagemaker:DeleteDomain",
      "sagemaker:DeleteUserProfile"
    ],
    "Resource": "*"
  }
]

"Sid": "SageMakerEndpointLifecycleManagement"

Controls the core lifecycle of hosting, including creating, updating, and deleting the physical endpoints and their configurations. Also includes discovery actions (List/Describe) so admins have full visibility.

Policy Action

Description

sagemaker:CreateEndpoint

Launches the actual HTTPS endpoint based on a specific configuration. Once this action completes, the endpoint is “InService” and ready to process inference requests.

sagemaker:CreateEndpointConfig

Defines the configuration for a model deployment. It acts as a “blueprint” that specifies exactly how SageMaker should host your machine learning model before you actually create the live endpoint.

sagemaker:UpdateEndpoint

Switches an endpoint to a new configuration (e.g., rolling out a new model version). This is typically used for “Blue/Green” deployments to swap a model version or change instance types without downtime.

sagemaker:UpdateEndpointWeightsAndCapacities

Dynamically adjusts the traffic distribution and instance counts of models (production variants) hosted on an active endpoint. Unlike UpdateEndpoint, which often involves deploying a new configuration and can trigger a rolling update, this operation allows you to make “in-place” adjustments to existing variants without changing the underlying Endpoint Config.

sagemaker:DeleteEndpoint

Shuts down the hosted infrastructure (ENDPOINT) and stops incurring charges. This action does not delete the configuration or the models themselves.

sagemaker:DeleteEndpointConfig

Permanently removes the specified endpoint configuration blueprint. You cannot delete a configuration that is currently being used by a live or updating endpoint.

sagemaker:DescribeEndpoint

Views the current status and details of a live SageMaker endpoint.

sagemaker:DescribeEndpointConfig

Retrieves the specific settings defined in an endpoint configuration.

sagemaker:ListEndpoints

Lists all endpoints in the account for platform-wide visibility.

sagemaker:ListEndpointConfigs

Lists all endpoint configurations to browse existing blueprints.

"Sid": "ModelManagement"

Allows the definition of the software/model artifacts that the endpoints will run.

Policy Action

Description

sagemaker:CreateModel

Grants permission to create a model in SageMaker. This process involves naming the model and specifying the Docker container image, model artifacts (usually from S3), and inference code required for deployment.

sagemaker:DescribeModel

Grants permission to view the details of a specific model. This returns information about the model’s configuration, such as the primary container, execution role, and creation time.

sagemaker:DeleteModel

Grants permission to delete a model resource. This action only removes the model entry in SageMaker; it does not delete the underlying model artifacts in S3 or the associated IAM roles.

sagemaker:ListModels

Lists all models in the account. Admins need this to discover and audit models before associating them with endpoint configurations.

"Sid": "InferenceExecution"

Grants the ability to send data to endpoints across all environments. At Level 4, there is no environment restriction — admins need to invoke any endpoint for testing, validation, and troubleshooting.

Policy Action

Description

sagemaker:InvokeEndpoint

Sends data to a real-time endpoint for a prediction/inference response.

sagemaker:InvokeEndpointAsync

Sends data to an asynchronous endpoint for inference. Unlike a real-time request, the model processes the data in the background and saves the prediction result to an S3 bucket rather than returning it immediately.

"Sid": "AutoscalingAndMonitoring"

Manages the horizontal scaling rules (adding/removing instances) based on traffic demand.

Policy Action

Description

application-autoscaling:RegisterScalableTarget

Registers an AWS or custom resource as a scalable target, allowing Application Auto Scaling to manage it. It also sets or updates the minimum and maximum capacity limits.

application-autoscaling:DeregisterScalableTarget

Removes a resource from being a scalable target. This action also deletes all associated scaling policies and scheduled actions for that resource.

application-autoscaling:PutScalingPolicy

Creates or updates a scaling policy (target tracking, step scaling, or predictive) for a registered scalable target to automate capacity adjustments.

application-autoscaling:DeleteScalingPolicy

Deletes a specific scaling policy. For target tracking, it also removes the CloudWatch alarms created on your behalf; for step scaling, it deletes the alarm action but not the alarm itself.

application-autoscaling:DescribeScalableTargets

Retrieves detailed information about one or more scalable targets in a specified service namespace, including their current capacity limits.

application-autoscaling:DescribeScalingPolicies

Returns information about the scaling policies for the specified service namespace and scalable targets.

cloudwatch:PutMetricAlarm

Creates or updates an alarm and associates it with a specific metric. In an autoscaling context, these alarms trigger the scaling policies when thresholds are breached.

cloudwatch:DescribeAlarms

Retrieves information about specified alarms. It is often used to verify the status or configuration of alarms used by autoscaling policies.

cloudwatch:DeleteAlarms

Deletes the specified alarms. This is used during cleanup to ensure that unused CloudWatch alarms are removed after a scaling policy or resource is deleted.

"Sid": "TaggingAndMetadata"

Enables resource organization, cost tracking, and access control via metadata tags.

Policy Action

Description

sagemaker:AddTags

Grants permission to add or overwrite one or more tags for a specified SageMaker resource (e.g., notebook instances, models, or training jobs).

sagemaker:DeleteTags

Grants permission to remove one or more specific tags from a SageMaker resource.

sagemaker:ListTags

Grants permission to view/list all tags currently associated with a specific SageMaker resource.

"Sid": "PassRoleToSageMaker"

Required for CreateModel and CreateEndpoint — SageMaker needs an execution role to pull model artifacts from S3 and write logs. Scoped to roles matching the platform’s naming convention and conditioned to SageMaker only, preventing privilege escalation to other services.

Policy Action

Description

iam:PassRole

Assign IAM execution roles to SageMaker models and endpoints. Scoped to {company_prefix}-{env}-*-role-* and conditioned to sagemaker.amazonaws.com only.

"Sid": "ExplicitDenyDataDeletion"

Safety net to protect the SageMaker Studio environment itself. Even with full endpoint management, admins should not accidentally destroy the shared platform infrastructure. An explicit Deny ensures no other policy can override this protection.

Policy Action Denied

Description

sagemaker:DeleteDomain

Prevents the accidental deletion of the entire SageMaker Studio environment, which includes all user settings and shared resources.

sagemaker:DeleteUserProfile

Prevents the deletion of individual user profiles. Deleting a profile causes the user to lose access to their associated data, notebooks, and artifacts stored in their EFS volume.

Deployment Script:

import boto3

sagemaker = boto3.client('sagemaker')

# Create endpoint configuration
config_name = 'fraud-detection-v3-config'
sagemaker.create_endpoint_config(
    EndpointConfigName=config_name,
    ProductionVariants=[{
        'VariantName': 'AllTraffic',
        'ModelName': 'fraud-detection-v3',
        'InitialInstanceCount': 2,
        'InstanceType': 'ml.m5.xlarge'
    }]
)

# Create endpoint
endpoint_name = 'acme-prod-fraud-detection'
sagemaker.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=config_name,
    Tags=[
        {'Key': 'Environment', 'Value': 'production'},
        {'Key': 'Model', 'Value': 'fraud-detection'},
        {'Key': 'Version', 'Value': 'v3'}
    ]
)

print(f"Endpoint {endpoint_name} created successfully")

Security Note: ⚠ This level should be assigned sparingly. Most users need dev-invoke or prod-invoke.


Level 4-ci: deploy-only¶

Purpose: Automated deployment of endpoints and models without destructive or traffic-shifting actions

Principal: Machine (CI/CD pipelines, deployment automation)

Typical Users:

  • CI/CD deployment pipelines (CodePipeline, GitHub Actions, GitLab CI)

  • Automated model deployment workflows

  • Infrastructure-as-Code automation (CloudFormation, CDK)

How It Differs from full (Level 4):

  • No delete actions — pipelines deploy forward, never tear down

  • No traffic weight shifting — canary/blue-green traffic decisions are a separate human or canary-pipeline concern

  • Same create/update/invoke scope — full deployment capability across all environments

  • Machine identity only — assigned to service roles, never to human users

What You Can Do:

  • ✅ Everything in full, EXCEPT:

  • ✅ Create new endpoints and endpoint configurations

  • ✅ Update existing endpoints to new configurations

  • ✅ Register models for deployment

  • ✅ Invoke endpoints across all environments (smoke tests)

  • ✅ Configure autoscaling policies

  • ✅ Tag resources with deployment metadata

  • ✅ Pass execution roles to SageMaker

What You Cannot Do:

  • ❌ Delete endpoints, endpoint configurations, or models

  • ❌ Shift traffic weights between production variants

  • ❌ Delete SageMaker domains or user profiles

Example Scenario:

The CI/CD pipeline receives a merged PR that triggers a model deployment. It creates a new endpoint configuration with the updated model artifact, updates the production endpoint to use the new configuration, configures autoscaling, and runs a smoke test by invoking the endpoint. It cannot delete the old endpoint — that’s a separate cleanup job requiring human approval.

Sample Permissions:

[
  {
    "Sid": "SageMakerEndpointDeployment",
    "Effect": "Allow",
    "Action": [
      "sagemaker:CreateEndpoint",
      "sagemaker:CreateEndpointConfig",
      "sagemaker:UpdateEndpoint",
      "sagemaker:DescribeEndpoint",
      "sagemaker:DescribeEndpointConfig",
      "sagemaker:ListEndpoints",
      "sagemaker:ListEndpointConfigs"
    ],
    "Resource": "*"
  },
  {
    "Sid": "ModelRegistration",
    "Effect": "Allow",
    "Action": [
      "sagemaker:CreateModel",
      "sagemaker:CreateModelPackage",
      "sagemaker:CreateModelPackageGroup",
      "sagemaker:DescribeModel",
      "sagemaker:DescribeModelPackage",
      "sagemaker:DescribeModelPackageGroup",
      "sagemaker:ListModels",
      "sagemaker:ListModelPackages",
      "sagemaker:ListModelPackageGroups",
      "sagemaker:UpdateModelPackage"
    ],
    "Resource": "*"
  },
  {
    "Sid": "InferenceExecution",
    "Effect": "Allow",
    "Action": [
      "sagemaker:InvokeEndpoint",
      "sagemaker:InvokeEndpointAsync"
    ],
    "Resource": "*"
  },
  {
    "Sid": "AutoscalingConfiguration",
    "Effect": "Allow",
    "Action": [
      "application-autoscaling:RegisterScalableTarget",
      "application-autoscaling:PutScalingPolicy",
      "application-autoscaling:DescribeScalableTargets",
      "application-autoscaling:DescribeScalingPolicies",
      "cloudwatch:PutMetricAlarm",
      "cloudwatch:DescribeAlarms"
    ],
    "Resource": "*"
  },
  {
    "Sid": "TaggingAndMetadata",
    "Effect": "Allow",
    "Action": [
      "sagemaker:AddTags",
      "sagemaker:ListTags"
    ],
    "Resource": "*"
  },
  {
    "Sid": "PassRoleToSageMaker",
    "Effect": "Allow",
    "Action": [
      "iam:PassRole"
    ],
    "Resource": "arn:aws:iam::*:role/{company_prefix}-{env}-*-role-*",
    "Condition": {
      "StringEquals": {
        "iam:PassedToService": "sagemaker.amazonaws.com"
      }
    }
  },
  {
    "Sid": "DenyDestructiveActions",
    "Effect": "Deny",
    "Action": [
      "sagemaker:DeleteEndpoint",
      "sagemaker:DeleteEndpointConfig",
      "sagemaker:DeleteModel",
      "sagemaker:DeleteModelPackage",
      "sagemaker:DeleteModelPackageGroup",
      "sagemaker:UpdateEndpointWeightsAndCapacities",
      "sagemaker:DeleteDomain",
      "sagemaker:DeleteUserProfile"
    ],
    "Resource": "*"
  }
]

"Sid": "SageMakerEndpointDeployment"

Grants the core deployment actions: create new endpoints and configurations, update existing endpoints to new configurations, and discover/inspect all endpoints. Excludes delete actions — teardown is not a CI/CD pipeline responsibility.

Policy Action

Description

sagemaker:CreateEndpoint

Launches a new HTTPS endpoint based on a specific configuration.

sagemaker:CreateEndpointConfig

Defines the deployment blueprint specifying model, instance type, and variant configuration.

sagemaker:UpdateEndpoint

Switches an endpoint to a new configuration for rolling deployments and model version updates.

sagemaker:DescribeEndpoint

Returns detailed information about an endpoint’s current status and configuration.

sagemaker:DescribeEndpointConfig

Retrieves the settings defined in an endpoint configuration.

sagemaker:ListEndpoints

Lists all endpoints in the account for deployment verification.

sagemaker:ListEndpointConfigs

Lists all endpoint configurations for blueprint discovery.

"Sid": "ModelRegistration"

Allows the pipeline to register model artifacts and manage model packages in the SageMaker Model Registry. Includes UpdateModelPackage for automated approval workflows.

Policy Action

Description

sagemaker:CreateModel

Creates a model resource pointing to the container image and S3 model artifacts.

sagemaker:CreateModelPackage

Registers a model version in a model package group for versioned tracking.

sagemaker:CreateModelPackageGroup

Creates a new model package group to organize model versions.

sagemaker:DescribeModel

Views details of a specific model resource.

sagemaker:DescribeModelPackage

Views details of a specific model package version.

sagemaker:DescribeModelPackageGroup

Views details of a model package group.

sagemaker:ListModels

Lists all models in the account.

sagemaker:ListModelPackages

Lists model package versions within a group.

sagemaker:ListModelPackageGroups

Lists all model package groups.

sagemaker:UpdateModelPackage

Updates model package metadata, including approval status for automated promotion workflows.

"Sid": "InferenceExecution"

Grants invoke access across all environments for post-deployment smoke tests and health checks.

Policy Action

Description

sagemaker:InvokeEndpoint

Sends a synchronous request to an endpoint for real-time inference.

sagemaker:InvokeEndpointAsync

Sends an asynchronous inference request for large payloads or long processing.

"Sid": "AutoscalingConfiguration"

Allows the pipeline to configure autoscaling after deployment. Excludes DeregisterScalableTarget and DeleteScalingPolicy — scaling teardown is a destructive action.

Policy Action

Description

application-autoscaling:RegisterScalableTarget

Registers an endpoint variant as a scalable target with min/max capacity.

application-autoscaling:PutScalingPolicy

Creates or updates a scaling policy (target tracking, step, or predictive).

application-autoscaling:DescribeScalableTargets

Retrieves information about registered scalable targets.

application-autoscaling:DescribeScalingPolicies

Returns information about scaling policies for verification.

cloudwatch:PutMetricAlarm

Creates alarms that trigger scaling policies when thresholds are breached.

cloudwatch:DescribeAlarms

Retrieves alarm status for deployment verification.

"Sid": "TaggingAndMetadata"

Allows the pipeline to tag deployed resources with deployment metadata (commit hash, pipeline run ID, version). Excludes DeleteTags — tag cleanup is not a deployment concern.

Policy Action

Description

sagemaker:AddTags

Adds or overwrites tags on SageMaker resources for tracking and cost allocation.

sagemaker:ListTags

Lists tags on a resource for verification after tagging.

"Sid": "PassRoleToSageMaker"

Required for CreateModel and CreateEndpoint — SageMaker needs an execution role to pull model artifacts from S3 and write logs. Scoped to roles matching the platform’s naming convention and conditioned to SageMaker only.

Policy Action

Description

iam:PassRole

Assigns IAM execution roles to SageMaker models and endpoints. Scoped to {company_prefix}-{env}-*-role-* and conditioned to sagemaker.amazonaws.com only.

"Sid": "DenyDestructiveActions"

Explicit deny on all destructive and traffic-shifting actions. This is the core guardrail that differentiates level4-ci from level4. CI/CD pipelines deploy forward — teardown and traffic shifting require separate authorization.

Policy Action Denied

Description

sagemaker:DeleteEndpoint

(Denied) Prevents pipeline from removing production endpoints.

sagemaker:DeleteEndpointConfig

(Denied) Prevents pipeline from removing endpoint configuration blueprints.

sagemaker:DeleteModel

(Denied) Prevents pipeline from removing model resources.

sagemaker:DeleteModelPackage

(Denied) Prevents pipeline from removing model package versions.

sagemaker:DeleteModelPackageGroup

(Denied) Prevents pipeline from removing model package groups.

sagemaker:UpdateEndpointWeightsAndCapacities

(Denied) Prevents pipeline from shifting traffic between production variants. Traffic decisions should be a separate human or canary-pipeline concern.

sagemaker:DeleteDomain

(Denied) Prevents accidental deletion of the SageMaker Studio environment.

sagemaker:DeleteUserProfile

(Denied) Prevents deletion of individual user profiles and their associated data.

Security Note: ⚠ This level is designed exclusively for machine identities (service roles). Never assign to human users — humans who need full SageMaker access should use level4 (full).


Lambda Inference¶

Lambda Inference policies control access to Lambda functions that serve ML models for predictions. Lambda is ideal for lightweight, event-driven inference workloads where cold start latency is acceptable and cost optimization is a priority.

Level 1: invoke-only¶

Purpose: Call Lambda inference functions without managing them

Principal: Machine (backend services, API Gateway) or Human (developers testing)

Typical Users:

  • Backend API services calling model endpoints

  • API Gateway integrations

  • Event-driven architectures (S3 triggers, SQS consumers)

  • Developers testing inference locally

What You Can Do:

  • ✅ Invoke Lambda inference functions

  • ✅ View function configuration and metadata

  • ✅ List available inference functions

  • ✅ Check function status and last invocation

What You Cannot Do:

  • ❌ Create or delete Lambda functions

  • ❌ Modify function code or configuration

  • ❌ Change memory, timeout, or environment variables

  • ❌ Manage layers or aliases

Example Scenario:

An API Gateway route receives image classification requests from a mobile app. It invokes a Lambda function that loads a lightweight PyTorch model and returns the predicted label. The API service only needs invoke permission — it never modifies the function.

Sample Permissions:

[
  {
    "Sid": "LambdaDiscoveryListActions",
    "Effect": "Allow",
    "Action": [
      "lambda:ListFunctions"
    ],
    "Resource": "*"
  },
  {
    "Sid": "LambdaDiscoveryActions",
    "Effect": "Allow",
    "Action": [
      "lambda:ListAliases",
      "lambda:ListTags",
      "lambda:GetFunction",
      "lambda:GetFunctionConfiguration",
      "lambda:GetPolicy",
      "lambda:GetAlias",
      "lambda:GetFunctionUrlConfig",
      "lambda:ListFunctionUrlConfigs",
      "lambda:GetProvisionedConcurrencyConfig",
      "lambda:ListProvisionedConcurrencyConfigs"
    ],
    "Resource": "arn:aws:lambda:{region}:{account_id}:function:{company_prefix}-{env}-*"
  },
  {
    "Sid": "LambdaInvocationActions",
    "Effect": "Allow",
    "Action": [
      "lambda:InvokeFunction",
      "lambda:InvokeFunctionUrl",
      "lambda:GetFunctionEventInvokeConfig",
      "lambda:ListFunctionEventInvokeConfigs",
      "lambda:GetFunctionConcurrency"
    ],
    "Resource": "arn:aws:lambda:{region}:{account_id}:function:{company_prefix}-{env}-*"
  }
]

"Sid": "LambdaDiscoveryListActions"

Grants permissions to list Lambda functions, their aliases, and associated tags. This allows users to discover available inference endpoints and understand their organization without accessing sensitive configuration details.

Policy Action

Description

lambda:ListFunctions

Retrieves a list of all Lambda functions in the region to identify inference endpoints.

"Sid": "LambdaDiscoveryActions"

Provides permissions to view detailed information about Lambda functions, including their configuration, access policies, aliases, URL configurations, and concurrency settings. This allows users to understand the capabilities and status of inference functions without modifying them.

Policy Action

Description

lambda:ListAliases

Lists all aliases for a specific function to find different deployment versions.

lambda:ListTags

Lists tags assigned to the function for resource filtering and organization.

lambda:GetFunction

Returns the configuration and a pre-signed URL to download the deployment package.

lambda:GetFunctionConfiguration

Provides specific metadata like runtime, handler, and environment variables.

lambda:GetPolicy

Retrieves the resource-based policy to verify access permissions.

lambda:GetAlias

Retrieves information about a specific function alias (e.g., ‘prod’ or ‘staging’).

lambda:GetFunctionUrlConfig

Returns the URL configuration for functions used as direct HTTP(S) endpoints.

lambda:ListFunctionUrlConfigs

Lists all URL configurations associated with a function.

lambda:GetProvisionedConcurrencyConfig

Retrieve the status and details of the Provisioned Concurrency setup for a specific function version or alias.

lambda:ListProvisionedConcurrencyConfigs

Lists all provisioned concurrency configurations for a function to assess scaling readiness.

"Sid": "LambdaInvocationActions"

Grants permissions to execute Lambda functions and manage asynchronous execution flows. This allows users to invoke inference functions for predictions while still preventing any modifications to the function code or configuration.

Policy Action

Description

lambda:InvokeFunction

The primary action for synchronous or asynchronous execution of the inference code.

lambda:InvokeFunctionUrl

Enables execution via the built-in Lambda HTTP(S) endpoint.

lambda:GetFunctionEventInvokeConfig

Retrieves configuration for asynchronous delivery, such as destination and retry attempts.

lambda:ListFunctionEventInvokeConfigs

Lists all asynchronous invocation configurations for the function.

lambda:GetFunctionConcurrency

Allows checking if the function has enough reserved capacity to handle the expected inference load.


Level 2: deploy-manage¶

Purpose: Deploy and configure Lambda inference functions

Principal: Human (ML engineers, DevOps) or Machine (CI/CD pipelines)

Typical Users:

  • ML engineers packaging models into Lambda functions

  • DevOps engineers configuring memory, timeout, and concurrency

  • CI/CD pipelines deploying new model versions

  • Data scientists publishing lightweight models

How It Differs from invoke-only (Level 1):

  • Adds deployment — create, update, and publish function versions

  • Adds configuration — modify memory, timeout, environment variables, layers

  • Adds alias management — create aliases for blue/green and canary deployments

  • Still no delete — function removal requires Level 3

What You Can Do:

  • ✅ Everything in invoke-only, PLUS:

  • ✅ Create new Lambda inference functions

  • ✅ Update function code with new model versions

  • ✅ Configure memory, timeout, and concurrency settings

  • ✅ Manage function aliases for traffic shifting

  • ✅ Add and update Lambda layers (model dependencies)

  • ✅ Set environment variables (model paths, feature flags)

What You Cannot Do:

  • ❌ Delete Lambda functions

  • ❌ Modify IAM execution roles

  • ❌ Change VPC or security group settings

Example Scenario:

An ML engineer has retrained the image classification model and needs to deploy the new version. They update the Lambda function code, publish a new version, and shift 10% of traffic to the new version via a weighted alias — all without touching the production alias until validation passes.

Sample Permissions:

[
  {
    "Sid": "LambdaGlobalDiscovery",
    "Effect": "Allow",
    "Action": [
      "lambda:GetAccountSettings",
      "lambda:ListFunctions",
      "lambda:ListLayers",
      "lambda:ListLayerVersions",
      "lambda:ListCodeSigningConfigs",
      "lambda:ListEventSourceMappings"
    ],
    "Resource": "*"
  },
  {
    "Sid": "LambdaFunctionDiscovery",
    "Effect": "Allow",
    "Action": [
      "lambda:GetAlias",
      "lambda:GetFunction",
      "lambda:GetFunctionCodeSigningConfig",
      "lambda:GetFunctionConcurrency",
      "lambda:GetFunctionConfiguration",
      "lambda:GetFunctionEventInvokeConfig",
      "lambda:GetFunctionUrlConfig",
      "lambda:GetPolicy",
      "lambda:GetProvisionedConcurrencyConfig",
      "lambda:GetRuntimeManagementConfig",
      "lambda:ListAliases",
      "lambda:ListFunctionEventInvokeConfigs",
      "lambda:ListFunctionUrlConfigs",
      "lambda:ListProvisionedConcurrencyConfigs",
      "lambda:ListTags",
      "lambda:ListVersionsByFunction"
    ],
    "Resource": "arn:aws:lambda:{region}:{account_id}:function:{company_prefix}-{env}-*"
  },
  {
    "Sid": "LambdaLayerDiscovery",
    "Effect": "Allow",
    "Action": [
      "lambda:GetLayerVersion",
      "lambda:GetLayerVersionPolicy"
    ],
    "Resource": "arn:aws:lambda:{region}:{account_id}:layer:{company_prefix}-{env}-*"
  },
  {
    "Sid": "LambdaInvocation",
    "Effect": "Allow",
    "Action": [
      "lambda:InvokeFunction",
      "lambda:InvokeFunctionUrl"
    ],
    "Resource": "arn:aws:lambda:{region}:{account_id}:function:{company_prefix}-{env}-*"
  },
  {
    "Sid": "LambdaDeploymentAndConfiguration",
    "Effect": "Allow",
    "Action": [
      "lambda:CreateFunction",
      "lambda:UpdateFunctionCode",
      "lambda:UpdateFunctionConfiguration",
      "lambda:PublishVersion",
      "lambda:CreateAlias",
      "lambda:UpdateAlias",
      "lambda:PutFunctionConcurrency",
      "lambda:PutFunctionEventInvokeConfig",
      "lambda:PutProvisionedConcurrencyConfig",
      "lambda:CreateFunctionUrlConfig",
      "lambda:UpdateFunctionUrlConfig",
      "lambda:TagResource"
    ],
    "Resource": "arn:aws:lambda:{region}:{account_id}:function:{company_prefix}-{env}-*"
  },
  {
    "Sid": "LambdaLayerManagement",
    "Effect": "Allow",
    "Action": [
      "lambda:PublishLayerVersion"
    ],
    "Resource": "arn:aws:lambda:{region}:{account_id}:layer:{company_prefix}-{env}-*"
  },
  {
    "Sid": "PassRoleToLambda",
    "Effect": "Allow",
    "Action": "iam:PassRole",
    "Resource": "arn:aws:iam::{account_id}:role/{company_prefix}-{env}-*-role-*",
    "Condition": {
      "StringEquals": {
        "iam:PassedToService": "lambda.amazonaws.com"
      }
    }
  },
  {
    "Sid": "DenyDeleteAndPermissionChanges",
    "Effect": "Deny",
    "Action": [
      "lambda:DeleteFunction",
      "lambda:DeleteAlias",
      "lambda:DeleteFunctionUrlConfig",
      "lambda:DeleteFunctionConcurrency",
      "lambda:DeleteFunctionEventInvokeConfig",
      "lambda:DeleteProvisionedConcurrencyConfig",
      "lambda:DeleteLayerVersion",
      "lambda:AddPermission",
      "lambda:RemovePermission"
    ],
    "Resource": "*"
  }
]

"Sid": "LambdaGlobalDiscovery"

Grants account-level and cross-function read access for actions that do not support resource-level scoping. These actions require Resource: * per the AWS Service Authorization Reference.

Policy Action

Description

lambda:GetAccountSettings

Returns account-level limits and usage such as concurrent execution quotas.

lambda:ListFunctions

Retrieves a list of all Lambda functions in the region.

lambda:ListLayers

Lists all Lambda layers available in the region.

lambda:ListLayerVersions

Lists published versions of a specific layer.

lambda:ListCodeSigningConfigs

Lists code signing configurations in the account.

lambda:ListEventSourceMappings

Lists event source mappings in the account.

"Sid": "LambdaFunctionDiscovery"

Grants read-only access to function-level metadata, configuration, aliases, concurrency settings, and URL configurations. All actions are scoped to tenant-prefixed functions.

Policy Action

Description

lambda:GetAlias

Returns details about a specific function alias.

lambda:GetFunction

Returns the function configuration and a pre-signed URL for the deployment package.

lambda:GetFunctionCodeSigningConfig

Returns the code signing config attached to a function.

lambda:GetFunctionConcurrency

Returns the reserved concurrency configuration for a function.

lambda:GetFunctionConfiguration

Returns version-specific settings such as runtime, handler, memory, and timeout.

lambda:GetFunctionEventInvokeConfig

Returns the asynchronous invocation configuration (retries, destinations).

lambda:GetFunctionUrlConfig

Returns the function URL configuration for direct HTTP(S) access.

lambda:GetPolicy

Returns the resource-based policy attached to the function.

lambda:GetProvisionedConcurrencyConfig

Returns the provisioned concurrency configuration for an alias or version.

lambda:GetRuntimeManagementConfig

Returns the runtime management configuration (auto or manual updates).

lambda:ListAliases

Lists all aliases for a specific function.

lambda:ListFunctionEventInvokeConfigs

Lists asynchronous invocation configurations for a function.

lambda:ListFunctionUrlConfigs

Lists URL configurations associated with a function.

lambda:ListProvisionedConcurrencyConfigs

Lists provisioned concurrency configurations for a function.

lambda:ListTags

Lists tags assigned to the function.

lambda:ListVersionsByFunction

Lists published versions of a function.

"Sid": "LambdaLayerDiscovery"

Grants read-only access to layer version details and policies. Layer actions require a layer ARN, not a function ARN, so they are scoped separately.

Policy Action

Description

lambda:GetLayerVersion

Returns details about a specific layer version, including the download URL.

lambda:GetLayerVersionPolicy

Returns the resource-based policy for a layer version.

"Sid": "LambdaInvocation"

Grants permission to execute Lambda inference functions via direct invocation or function URLs. Scoped to tenant-prefixed functions.

Policy Action

Description

lambda:InvokeFunction

Sends a synchronous or asynchronous request to execute the function.

lambda:InvokeFunctionUrl

Invokes the function via its built-in HTTP(S) endpoint.

"Sid": "LambdaDeploymentAndConfiguration"

Grants permissions to create functions, deploy new code versions, configure runtime settings, manage aliases for traffic shifting, and set concurrency. Does not include delete actions.

Policy Action

Description

lambda:CreateFunction

Creates a new Lambda function with the specified code and configuration.

lambda:UpdateFunctionCode

Deploys new code to an existing function (e.g., updated model artifact).

lambda:UpdateFunctionConfiguration

Modifies function settings such as memory, timeout, and environment variables.

lambda:PublishVersion

Creates an immutable snapshot of the current function code and configuration.

lambda:CreateAlias

Creates a named alias pointing to a function version for traffic routing.

lambda:UpdateAlias

Updates an alias to point to a different version or adjust traffic weights.

lambda:PutFunctionConcurrency

Sets reserved concurrency to guarantee execution capacity.

lambda:PutFunctionEventInvokeConfig

Configures asynchronous invocation settings (retries, destinations).

lambda:PutProvisionedConcurrencyConfig

Allocates provisioned concurrency to reduce cold starts.

lambda:CreateFunctionUrlConfig

Creates an HTTP(S) endpoint for direct function invocation.

lambda:UpdateFunctionUrlConfig

Modifies the function URL configuration.

lambda:TagResource

Adds or updates tags on the function for organization and cost tracking.

"Sid": "LambdaLayerManagement"

Grants permission to publish new layer versions containing model dependencies, shared libraries, or custom runtimes. Layer actions require a layer ARN, scoped separately from functions.

Policy Action

Description

lambda:PublishLayerVersion

Publishes a new version of a layer with updated dependencies or libraries.

"Sid": "PassRoleToLambda"

Allows passing an IAM execution role to Lambda when creating or updating functions. Scoped to tenant-prefixed roles and conditioned to the Lambda service only.

Policy Action

Description

iam:PassRole

Passes an IAM role to Lambda as the function’s execution role.

"Sid": "DenyDeleteAndPermissionChanges"

Explicitly prevents deletion of functions, aliases, layers, concurrency configs, and URL configs. Also blocks changes to resource-based policies (AddPermission/RemovePermission) which control cross-account access. These destructive and permission-escalation actions are reserved for Level 3 (full).

Policy Action Denied

Description

lambda:DeleteFunction

(Denied) Deletes a Lambda function and all its versions.

lambda:DeleteAlias

(Denied) Deletes a function alias.

lambda:DeleteFunctionUrlConfig

(Denied) Removes the function URL endpoint.

lambda:DeleteFunctionConcurrency

(Denied) Removes reserved concurrency from a function.

lambda:DeleteFunctionEventInvokeConfig

(Denied) Removes asynchronous invocation configuration.

lambda:DeleteProvisionedConcurrencyConfig

(Denied) Removes provisioned concurrency allocation.

lambda:DeleteLayerVersion

(Denied) Deletes a published layer version.

lambda:AddPermission

(Denied) Adds a statement to the function’s resource-based policy.

lambda:RemovePermission

(Denied) Removes a statement from the function’s resource-based policy.


Level 3: full¶

Purpose: Complete Lambda inference function lifecycle management

Principal: Human (platform admins, MLOps leads)

Typical Users:

  • Platform administrators

  • MLOps team leads

  • Infrastructure engineers

How It Differs from deploy-manage (Level 2):

  • Adds delete lifecycle — can remove functions, aliases, layers, versions, configs

  • Adds resource policy management — AddPermission/RemovePermission for cross-account invocation control

  • Adds code signing enforcement — manage code signing configurations

  • Adds event source mapping management — create/update/delete for event-driven inference

  • Account-wide scope — no function name restrictions (lambda:*)

What You Can Do:

  • ✅ Everything in deploy-manage, PLUS:

  • ✅ Delete functions, aliases, layers, versions, and configs

  • ✅ Manage resource-based policies (cross-account invocation control)

  • ✅ Manage code signing configurations

  • ✅ Create, update, and delete event source mappings

  • ✅ Full account-wide access — not restricted to naming conventions

What You Cannot Do:

  • ❌ Nothing — full Lambda inference management

Example Scenario:

The platform team is decommissioning a retired product line. They need to delete the associated Lambda inference functions, remove their event source mappings, clean up resource-based policies that granted cross-account access, and delete the Lambda layers that were dedicated to those functions.

Sample Permissions:

[
  {
    "Sid": "LambdaFullAccess",
    "Effect": "Allow",
    "Action": "lambda:*",
    "Resource": "*"
  },
  {
    "Sid": "PassRoleToLambda",
    "Effect": "Allow",
    "Action": "iam:PassRole",
    "Resource": "arn:aws:iam::{account_id}:role/{company_prefix}-{env}-*-role-*",
    "Condition": {
      "StringEquals": {
        "iam:PassedToService": "lambda.amazonaws.com"
      }
    }
  }
]

Why lambda:* instead of explicit action enumeration? Lambda’s API surface grows frequently as AWS adds new features. Unlike SageMaker — where new actions can spin up expensive training jobs or endpoints — new Lambda actions have low cost and blast-radius impact. Maintaining an explicit list of 50+ actions creates maintenance debt that leads to stale policies and broken deployments when AWS adds actions. The Resource: * scope is acceptable here because Level 3 principals are platform administrators who need to govern the entire account, including functions that may not follow naming conventions.

"Sid": "LambdaFullAccess"

Grants full Lambda access across all resource types in the account — functions, layers, event source mappings, code signing configs, and any future Lambda resource types. This is the administrative level for platform teams who manage the complete Lambda inference lifecycle.

Policy Action

Description

lambda:*

All Lambda actions — create, read, update, delete, invoke, and manage across all Lambda resource types.

"Sid": "PassRoleToLambda"

Allows passing an IAM execution role to Lambda when creating or updating functions. Even at full Lambda access, PassRole remains scoped to tenant-prefixed roles to prevent privilege escalation via arbitrary role attachment.

Policy Action

Description

iam:PassRole

Passes an IAM role to Lambda as the function’s execution role. Scoped to {company_prefix}-{env}-*-role-* and conditioned to the Lambda service only.

TODO: VPC Governance Subsection Document two config-driven VPC condition key patterns that apply to both Level 2 and Level 3:

  • Enforce specific VPC — Deny UpdateFunctionConfiguration when lambda:VpcIds doesn’t match config value

  • Deny all VPC — Deny UpdateFunctionConfiguration when lambda:VpcIds is present Config schema: lambda_inference.vpc_policy (“enforce” | “deny” | “none”)


Bedrock Inference¶

Bedrock Inference policies control access to foundation models (FMs) for generative AI workloads. Unlike SageMaker where you manage your own endpoints, Bedrock is a fully managed service — the policy focus is on which models can be invoked, where inference runs (cross-region), and how much throughput is provisioned.

Level 1: invoke-only¶

Purpose: Call foundation models for predictions without managing model access or throughput

Principal: Machine (backend services, chatbots) or Human (developers, analysts)

Typical Users:

  • Backend services integrating generative AI

  • Customer-facing chatbots and assistants

  • Developers prototyping with foundation models

  • Analysts using text summarization or classification

What You Can Do:

  • ✅ Invoke allowed foundation models

  • ✅ Use the Converse API for chat-based interactions

  • ✅ List available foundation models

  • ✅ View model details and capabilities

What You Cannot Do:

  • ❌ Enable or disable model access

  • ❌ Create or manage provisioned throughput

  • ❌ Configure cross-region inference

  • ❌ Manage custom models or fine-tuning jobs

  • ❌ Create or modify guardrails

Example Scenario:

A customer support chatbot needs to call Claude for generating responses. The service role can invoke the model but cannot change which models are available or provision dedicated throughput.

Sample Permissions:

[
  {
    "Sid": "BedrockDiscovery",
    "Effect": "Allow",
    "Action": [
      "bedrock:ListFoundationModels",
      "bedrock:GetFoundationModel"
    ],
    "Resource": "*"
  },
  {
    "Sid": "BedrockStandardInference",
    "Effect": "Allow",
    "Action": [
      "bedrock:InvokeModel",
      "bedrock:InvokeModelWithResponseStream"
    ],
    "Resource": "*"
  },
  {
    "Sid": "BedrockConverseInference",
    "Effect": "Allow",
    "Action": [
      "bedrock:Converse",
      "bedrock:ConverseStream"
    ],
    "Resource": "*"
  }
]

"Sid": "BedrockDiscovery"

Allows users to discover available foundation models and view their specific capabilities and details.

Policy Action

Description

bedrock:ListFoundationModels

Lists the foundation models available in Amazon Bedrock, which is necessary to identify which models can be invoked.

bedrock:GetFoundationModel

Retrieves detailed information about a specific foundation model, such as input/output modalities and customization support.

"Sid": "BedrockStandardInference"

Enables the core ability to send prompts and receive responses from foundation models, including streaming and chat-specific APIs.

Policy Action

Description

bedrock:InvokeModel

Sends a prompt to a specified model and receives the entire response in a single payload.

bedrock:InvokeModelWithResponseStream

Sends a prompt to a model and receives the response as a series of tokens (streaming), ideal for real-time applications.

"Sid": "BedrockConverseInference"

Enables multi-turn chat interactions via the Converse API. Kept as a separate Sid from BedrockStandardInference for three reasons:

  • Resource-level control — allows scoping Converse to specific foundation models or inference profiles independently from InvokeModel (e.g., for cost tracking)

  • Streaming restrictions — if compliance requires disabling streaming (which can bypass certain content inspection or logging), ConverseStream can be split into its own Sid

  • Auditability — separate Sids make it easier to identify which statement granted a specific permission in IAM policy evaluation

Note: bedrock:Converse and bedrock:ConverseStream are functional Bedrock API actions but were not listed in the AWS IAM Service Authorization Reference at time of writing. If they authorize via InvokeModel under the hood, having them listed explicitly does not affect policy behavior.

Policy Action

Description

bedrock:Converse

Provides a consistent API for multi-turn chat conversations, managing message history and formatting for supported models.

bedrock:ConverseStream

Allows for multi-turn chat conversations with the benefit of streaming responses for lower perceived latency.


Level 2: model-manage¶

Purpose: Manage model access, guardrails, and inference configurations

Principal: Human (ML engineers, AI/ML team leads)

Typical Users:

  • ML engineers configuring model access for teams

  • AI/ML team leads managing guardrails and content filters

  • DevOps engineers setting up cross-region inference

  • Data scientists managing custom model imports

How It Differs from invoke-only (Level 1):

  • Adds model access management — enable/disable foundation models for the account

  • Adds guardrail management — create and configure content filters and safety controls

  • Adds cross-region configuration — control where inference requests are routed

  • Still no throughput provisioning or deletion — cost-impacting decisions require Level 3

What You Can Do:

  • ✅ Everything in invoke-only, PLUS:

  • ✅ Enable and disable foundation model access

  • ✅ Create and configure guardrails (content filters, topic blocks)

  • ✅ Manage custom model imports

  • ✅ Configure cross-region inference profiles

  • ✅ View usage metrics and invocation logs

What You Cannot Do:

  • ❌ Create or delete provisioned throughput (cost-impacting)

  • ❌ Delete guardrails

  • ❌ Manage account-level Bedrock settings

Example Scenario:

The AI/ML team lead needs to enable a new Anthropic model for the development team, create a guardrail that blocks PII in model responses, and configure cross-region inference so requests can fail over to us-east-1 if us-west-2 is at capacity.

[
  {
    "Sid": "BedrockStandardInference",
    "Effect": "Allow",
    "Action": [
      "bedrock:InvokeModel",
      "bedrock:InvokeModelWithResponseStream"
    ],
    "Resource": "*"
  },
  {
    "Sid": "BedrockConverseInference",
    "Effect": "Allow",
    "Action": [
      "bedrock:Converse",
      "bedrock:ConverseStream"
    ],
    "Resource": "*"
  },
  {
    "Sid": "ModelAccessManagement",
    "Effect": "Allow",
    "Action": [
      "bedrock:GetFoundationModel",
      "bedrock:ListFoundationModels",
      "bedrock:PutModelInvocationLoggingConfiguration",
      "bedrock:GetModelInvocationLoggingConfiguration",
      "bedrock:ListModelInvocationJobs",
      "bedrock:PutFoundationModelEntitlement",
      "bedrock:PutUseCaseForModelAccess",
      "bedrock:ListFoundationModelAgreementOffers",
      "bedrock:CreateFoundationModelAgreement",
      "bedrock:GetFoundationModelAvailability",
      "bedrock:DeleteFoundationModelAgreement"
    ],
    "Resource": "*"
  },
  {
    "Sid": "GuardrailManagement",
    "Effect": "Allow",
    "Action": [
      "bedrock:CreateGuardrail",
      "bedrock:UpdateGuardrail",
      "bedrock:CreateGuardrailVersion",
      "bedrock:GetGuardrail",
      "bedrock:ListGuardrails"
    ],
    "Resource": "*"
  },
  {
    "Sid": "CustomModelImportManagement",
    "Effect": "Allow",
    "Action": [
      "bedrock:ImportModel",
      "bedrock:GetCustomModel",
      "bedrock:ListCustomModels",
      "bedrock:CreateModelImportJob",
      "bedrock:GetModelImportJob",
      "bedrock:ListModelImportJobs",
      "bedrock:StopModelImportJob"
    ],
    "Resource": "*"
  },
  {
    "Sid": "CrossRegionInferenceManagement",
    "Effect": "Allow",
    "Action": [
      "bedrock:CreateInferenceProfile",
      "bedrock:GetInferenceProfile",
      "bedrock:ListInferenceProfiles",
      "bedrock:UpdateInferenceProfile"
    ],
    "Resource": "*"
  },
  {
    "Sid": "ObservabilityAndMetrics",
    "Effect": "Allow",
    "Action": [
      "cloudwatch:GetMetricData",
      "cloudwatch:ListMetrics",
      "logs:DescribeLogGroups",
      "logs:GetLogEvents"
    ],
    "Resource": "*"
  },
  {
      "Sid": "DenyBedrockDeleteOperations",
      "Effect": "Deny",
      "Action": [
          "bedrock:DeleteCustomModel",
          "bedrock:DeleteModelInvocationLoggingConfiguration",
          "bedrock:DeleteProvisionedModelThroughput",
          "bedrock:DeleteModelImportJob",
          "bedrock:DeleteCustomModelDeployment",
          "bedrock:DeleteInferenceProfile",
          "bedrock:DeletePromptRouter",
          "bedrock:DeleteGuardrail",
          "bedrock:DeleteKnowledgeBase",
          "bedrock:DeleteAgent"
      ],
      "Resource": "*"
  }
]

"Sid": "BedrockStandardInference"

Enables the core ability to send prompts and receive responses from foundation models, including streaming and chat-specific APIs.

Policy Action

Description

bedrock:InvokeModel

Sends a prompt to a specified model and receives the entire response in a single payload.

bedrock:InvokeModelWithResponseStream

Sends a prompt to a model and receives the response as a series of tokens (streaming), ideal for real-time applications.

"Sid": "BedrockConverseInference"

Multi-turn chat interactions via the Converse API. Kept as a separate Sid for independent resource scoping, streaming control, and auditability (see Level 1 rationale).

Policy Action

Description

bedrock:Converse

Provides a consistent API for multi-turn chat conversations, managing message history and formatting for supported models.

bedrock:ConverseStream

Allows for multi-turn chat conversations with the benefit of streaming responses for lower perceived latency.

"Sid": "ModelAccessManagement"

Permissions to enable, disable, and manage entitlements for foundation models within the account.

Policy Action

Description

bedrock:GetFoundationModel

Retrieves detailed information and properties about a specific Amazon Bedrock foundation model.

bedrock:ListFoundationModels

Lists all foundation models available in Amazon Bedrock for the current region.

bedrock:PutModelInvocationLoggingConfiguration

Configures where to store model invocation logs, such as S3 buckets or CloudWatch Logs.

bedrock:GetModelInvocationLoggingConfiguration

Retrieves the current configuration for model invocation logging.

bedrock:ListModelInvocationJobs

Lists asynchronous model invocation jobs to track batch processing status.

bedrock:PutFoundationModelEntitlement

Submits a request for foundation model entitlement. Largely automated for most models but still required for certain provider-specific access flows.

bedrock:PutUseCaseForModelAccess

Submits the required provider use-case form for first-time model access, such as the Anthropic use-case disclosure. One-time per account or organization.

bedrock:ListFoundationModelAgreementOffers

Grants permission to view available agreement offers for foundation models.

bedrock:CreateFoundationModelAgreement

Grants permission to officially accept an offer and create a new agreement for a foundation model.

bedrock:GetFoundationModelAvailability

Grants permission to check if a specific foundation model is available for use in your account or region.

bedrock:DeleteFoundationModelAgreement

Grants permission to terminate or delete an existing foundation model agreement.

"Sid": "GuardrailManagement"

Permissions to create and configure safety controls, content filters, and PII masking without deletion rights.

Policy Action

Description

bedrock:CreateGuardrail

Creates a new guardrail to filter sensitive content or block specific topics in model responses.

bedrock:UpdateGuardrail

Modifies existing guardrail configurations, such as updating filter strengths or blocked words.

bedrock:CreateGuardrailVersion

Creates a snapshot version of a guardrail for consistent deployment across environments.

bedrock:GetGuardrail

Retrieves the detailed configuration of a specific guardrail.

bedrock:ListGuardrails

Lists all guardrails defined in the account.

"Sid": "CustomModelImportManagement"

Manages custom model imports and cross-region routing profiles for high availability.

Policy Action

Description

bedrock:ImportModel

Initiates the process of importing a custom model into Amazon Bedrock.

bedrock:GetCustomModel

Retrieves details about a custom or imported model.

bedrock:ListCustomModels

Lists all custom models available in the account.

bedrock:CreateModelImportJob

Starts the process of importing a custom model into Bedrock.

bedrock:GetModelImportJob

Retrieves detailed information and the current status of a specific import job.

bedrock:ListModelImportJobs

Returns a list of all model import jobs submitted.

bedrock:StopModelImportJob

Immediately cancels a model import job that is currently in progress.

"Sid": "CrossRegionInferenceManagement"

Enables managing and tracking model usage across one or multiple AWS regions.

Policy Action

Description

bedrock:CreateInferenceProfile

Sets up cross-region inference profiles to manage request routing and failover.

bedrock:GetInferenceProfile

Retrieves details about a specific inference profile.

bedrock:ListInferenceProfiles

Lists available inference profiles for the account.

bedrock:UpdateInferenceProfile

Grants permission to modify the settings of an existing application inference profile, such as updating its description or configuration.

"Sid": "ObservabilityAndMetrics"

Grants permissions to access CloudWatch metrics and logs related to Bedrock model invocations for monitoring and troubleshooting.

Policy Action

Description

cloudwatch:GetMetricData

Grants permission to retrieve batch amounts of CloudWatch metric data and perform metric math on the retrieved data.

cloudwatch:ListMetrics

Grants permission to retrieve a list of valid metrics stored for the AWS account owner, which can then be used to get statistical data.

logs:DescribeLogGroups

Grants permission to return all log groups associated with the requesting AWS account, including data sources that ingest into them.

logs:GetLogEvents

Grants permission to retrieve individual log events from a specific log stream, with the ability to filter results by time range.

"Sid": "DenyBedrockDeleteOperations"

Denies delete operations within Amazon Bedrock.

Policy Action

Description

bedrock:DeleteCustomModel

Deletes a custom model that was previously created through model customization (fine-tuning).

bedrock:DeleteModelInvocationLoggingConfiguration

Removes the configuration that logs model inputs and outputs to S3 or CloudWatch, which is often used for auditing.

bedrock:DeleteProvisionedModelThroughput

Deletes a Provisioned Throughput reservation; note that this typically cannot be done before a commitment term ends.

bedrock:DeleteModelImportJob

Deletes a record or job associated with importing a customized model from other environments like Amazon SageMaker.

bedrock:DeleteCustomModelDeployment

Stops and removes a deployed custom model, making its ARN unavailable for further inference.

bedrock:DeleteInferenceProfile

Deletes an inference profile, which is used to manage and track model invocation across different regions or configurations.

bedrock:DeletePromptRouter

Removes a prompt router used to direct incoming requests to specific models or versions.

bedrock:DeleteGuardrail

Deletes a Bedrock Guardrail, which provides content filtering and safety controls for generative AI applications.

bedrock:DeleteKnowledgeBase

Deletes a Knowledge Base resource used for Retrieval-Augmented Generation (RAG) workflows.

bedrock:DeleteAgent

Deletes an Amazon Bedrock Agent that automates tasks by interacting with foundation models and other AWS services.


Level 3: full¶

Purpose: Complete Bedrock platform management including cost-impacting operations

Principal: Human (platform admins, cloud architects)

Typical Users:

  • Platform administrators

  • Cloud architects

  • FinOps engineers (provisioned throughput decisions)

  • Security team (account-level controls)

How It Differs from model-manage (Level 2):

  • Adds provisioned throughput — create, modify, and delete dedicated model capacity (significant cost)

  • Adds delete operations — can remove guardrails, custom models, inference profiles, agents, knowledge bases

  • Adds account-level settings — manage Bedrock service-level configurations

  • Full fine-tuning control — create and manage model customization jobs

  • Full platform governance — agents, knowledge bases, evaluations, prompt routers, batch inference

What You Can Do:

  • ✅ Everything in model-manage, PLUS:

  • ✅ Create and delete provisioned throughput (dedicated capacity)

  • ✅ Delete guardrails, custom models, inference profiles, agents, knowledge bases

  • ✅ Manage model fine-tuning and customization jobs

  • ✅ Configure account-level Bedrock settings

  • ✅ Manage agents, knowledge bases, evaluations, and prompt routers

What You Cannot Do:

  • ❌ Nothing — full Bedrock management

Example Scenario:

The platform team needs to provision dedicated throughput for the production chatbot ahead of a product launch, clean up unused guardrails from a decommissioned project, and configure account-level logging for all Bedrock invocations.

Sample Permissions:

[
  {
    "Sid": "BedrockFullAccess",
    "Effect": "Allow",
    "Action": "bedrock:*",
    "Resource": "*"
  },
  {
    "Sid": "BedrockFullObservability",
    "Effect": "Allow",
    "Action": [
      "cloudwatch:GetMetricData",
      "cloudwatch:ListMetrics",
      "logs:DescribeLogGroups",
      "logs:GetLogEvents"
    ],
    "Resource": "*"
  },
  {
    "Sid": "PassRoleToBedrock",
    "Effect": "Allow",
    "Action": "iam:PassRole",
    "Resource": "arn:aws:iam::{account_id}:role/{company_prefix}-{env}-*-role-*",
    "Condition": {
      "StringEquals": {
        "iam:PassedToService": "bedrock.amazonaws.com"
      }
    }
  }
]

Why bedrock:* instead of explicit action enumeration? Bedrock’s API surface is growing rapidly as AWS adds agents, knowledge bases, evaluations, prompt routers, batch inference, marketplace models, and more. Unlike SageMaker — where new actions can spin up expensive training jobs or endpoints — Bedrock’s cost-impacting actions are limited to provisioned throughput, which admins explicitly manage. Maintaining an explicit list of 80+ actions creates maintenance debt that leads to stale policies and broken deployments when AWS adds actions. The Resource: * scope is acceptable here because Level 3 principals are platform administrators who need to govern the entire account.

"Sid": "BedrockFullAccess"

Grants full Bedrock access across all resource types in the account — foundation models, custom models, guardrails, inference profiles, provisioned throughput, agents, knowledge bases, evaluations, and any future Bedrock resource types. This is the administrative level for platform teams who manage the complete Bedrock lifecycle.

Policy Action

Description

bedrock:*

All Bedrock actions — invoke, model access, guardrails, provisioned throughput, custom models, agents, knowledge bases, evaluations, prompt routers, and all future actions.

"Sid": "BedrockFullObservability"

Grants permissions to access CloudWatch metrics and logs related to Bedrock model invocations for monitoring and troubleshooting. These are non-Bedrock namespace actions that bedrock:* does not cover.

Policy Action

Description

cloudwatch:GetMetricData

Grants permission to retrieve batch amounts of CloudWatch metric data and perform metric math on the retrieved data.

cloudwatch:ListMetrics

Grants permission to retrieve a list of valid metrics stored for the AWS account owner, which can then be used to get statistical data.

logs:DescribeLogGroups

Grants permission to return all log groups associated with the requesting AWS account, including data sources that ingest into them.

logs:GetLogEvents

Grants permission to retrieve individual log events from a specific log stream, with the ability to filter results by time range.

"Sid": "PassRoleToBedrock"

Allows passing an IAM service role to Bedrock for operations that require it, such as model customization (fine-tuning) jobs and model import jobs. Even at full Bedrock access, PassRole remains scoped to tenant-prefixed roles to prevent privilege escalation via arbitrary role attachment.

Policy Action

Description

iam:PassRole

Passes an IAM role to Bedrock as the service role for customization and import jobs. Scoped to {company_prefix}-{env}-*-role-* and conditioned to the Bedrock service only.

TODO: Config-Driven Bedrock Model Scoping

Add an inference section to client configs that controls Resource ARN generation for Bedrock invoke actions.

Config schema:

inference:
  bedrock:
    allowed_models:          # list of foundation model ID patterns
      - "anthropic.claude-3-sonnet-*"
      - "anthropic.claude-3-haiku-*"
      - "amazon.titan-embed-text-v1"
    allowed_regions:         # for cross-region inference profiles
      - "us-west-2"
      - "us-east-1"

Template generator behavior:

  • allowed_models present → Resource becomes list of arn:aws:bedrock:{region}::foundation-model/<model-id> ARNs

  • allowed_models absent or ["*"] → Resource stays "*"

  • Applies to Level 1 and Level 2 invoke Sids; Level 3 (full) always uses "*"

Per-tier defaults:

  • Startup: omit or ["*"] (no restriction, encourage exploration)

  • Medium: explicit model list (cost control)

  • Enterprise: explicit model list (compliance requirement)

Code changes needed:

  1. Add inference key to validation schemas (validation-schema-startup.yaml, medium, enterprise)

  2. Update template generator to read inference.bedrock.allowed_models and build Resource ARN list

  3. Update client configs with inference section (all 3 tiers)

  4. Add unit tests for ARN generation with and without allowed_models

Also include in inference section (for consistency with existing designs):

inference:
  sagemaker:
    endpoint_prefix: "{company_prefix}-{env}"
  lambda:
    vpc_policy: "none"       # "enforce" | "deny" | "none"
  bedrock:
    allowed_models: ["*"]
    allowed_regions: ["us-west-2"]

KMS Policies¶

KMS (Key Management Service) policies control read-only access to encryption keys used across the MLOps platform. KMS keys protect S3 objects, SageMaker model artifacts, and other sensitive data.

Level 1: read-only¶

Purpose: Verify encryption settings and key configurations without the ability to encrypt, decrypt, or modify keys.

Typical Users:

  • Operations support (verify encryption compliance)

  • Security auditors (review key policies and rotation status)

  • Compliance reviewers (confirm encryption standards)

What You Can Do:

  • ✅ View key metadata, policies, and rotation status

  • ✅ List all KMS keys and aliases in the account

  • ✅ View resource tags on keys

  • ✅ Retrieve public keys (for asymmetric keys)

What You Cannot Do:

  • ❌ No Encrypt/Decrypt: Cannot use keys to encrypt or decrypt data

  • ❌ No Key Management: Cannot create, disable, delete, or schedule deletion of keys

  • ❌ No Policy Changes: Cannot modify key policies or grants

  • ❌ No Key Rotation Changes: Cannot enable or disable automatic key rotation

Sample Permissions:

[
  {
    "Sid": "ReadOnlyAccessForAllKMSKeysInAccount",
    "Effect": "Allow",
    "Action": [
      "kms:GetPublicKey",
      "kms:GetKeyRotationStatus",
      "kms:GetKeyPolicy",
      "kms:DescribeKey",
      "kms:ListKeyPolicies",
      "kms:ListResourceTags",
      "tag:GetResources"
    ],
    "Resource": "arn:aws:kms:*:{account_id}:key/*"
  },
  {
    "Sid": "ReadOnlyAccessForOperationsWithNoKMSKey",
    "Effect": "Allow",
    "Action": [
      "kms:ListKeys",
      "kms:ListAliases"
    ],
    "Resource": "*"
  }
]

"Sid": "ReadOnlyAccessForAllKMSKeysInAccount"

Grants read-only access to individual KMS key metadata, scoped to the account.

Policy Action

Description

kms:GetPublicKey

Retrieve the public key of an asymmetric KMS key.

kms:GetKeyRotationStatus

Check whether automatic key rotation is enabled for a key.

kms:GetKeyPolicy

View the resource-based policy attached to a KMS key.

kms:DescribeKey

Retrieve metadata about a KMS key (creation date, state, key spec).

kms:ListKeyPolicies

List the names of key policies attached to a key.

kms:ListResourceTags

View tags associated with a KMS key.

tag:GetResources

Query resources by tag across services (supports KMS key discovery by tag).

"Sid": "ReadOnlyAccessForOperationsWithNoKMSKey"

Grants account-wide discovery actions that don’t target a specific key.

Policy Action

Description

kms:ListKeys

List all KMS key IDs in the account.

kms:ListAliases

List all key aliases for easy identification of keys by name.

Note: This policy replaces the non-existent AWSKeyManagementServiceReadOnlyAccess AWS managed policy. AWS does not provide a managed KMS read-only policy, so this is implemented as a custom policy template.


Trusted Advisor Policies¶

Trusted Advisor policies control read-only access to AWS Trusted Advisor checks and recommendations. Trusted Advisor evaluates your account against best practices for cost optimization, performance, security, fault tolerance, and service limits.

Level 1: read-only¶

Purpose: View Trusted Advisor check results and recommendations without the ability to refresh checks or modify preferences.

Typical Users:

  • FinOps managers (review cost optimization and performance recommendations)

What You Can Do:

  • ✅ View all Trusted Advisor check details and summaries

  • ✅ View flagged resources for each check

  • ✅ View account Support plan and Trusted Advisor preferences

What You Cannot Do:

  • ❌ No Refresh: Cannot refresh Trusted Advisor checks

  • ❌ No Modifications: Cannot modify Trusted Advisor preferences or notification settings

  • ❌ No Priority Access: Does not include Trusted Advisor Priority features (separate policy)

Sample Permissions:

[
  {
    "Sid": "TrustedAdvisorReadOnlyAccess",
    "Effect": "Allow",
    "Action": [
      "trustedadvisor:DescribeChecks",
      "trustedadvisor:DescribeCheckSummaries",
      "trustedadvisor:DescribeCheckItems",
      "trustedadvisor:DescribeAccount"
    ],
    "Resource": "*"
  }
]

"Sid": "TrustedAdvisorReadOnlyAccess"

Grants read-only access to Trusted Advisor checks and account information.

Policy Action

Description

trustedadvisor:DescribeChecks

View details for all Trusted Advisor checks.

trustedadvisor:DescribeCheckSummaries

View summaries of check results.

trustedadvisor:DescribeCheckItems

View specific details for flagged resources.

trustedadvisor:DescribeAccount

View Support plan and Trusted Advisor preferences.

Note: This policy replaces the non-existent AWSTrustedAdvisorReadOnlyAccess AWS managed policy. AWS does not provide a managed Trusted Advisor read-only policy. The closest alternatives are AWSTrustedAdvisorPriorityReadOnlyAccess (scoped to Priority features only) and AWSSupportAccess (broader, includes check refresh). This custom template provides least-privilege read-only access.


Combined Policies¶

Combined policies merge multiple service-level read-only policies into a single managed policy. This is required when a group needs read-only access across many services and would otherwise exceed the AWS hard limit of 10 managed policies per group.

Why Combined Policies Exist¶

AWS IAM enforces a hard cap of 10 managed policies per group (cannot be increased). Groups like operations_support need:

  • Multiple AWS managed read-only policies (CloudWatch, X-Ray, KMS, etc.)

  • Multiple customer managed service-level policies (S3, ECR, SageMaker, Lambda, Bedrock)

When the total exceeds 10, we consolidate the customer managed service-level policies into a single combined policy.

Key Constraints¶

Constraint

Limit

Managed policies per group

10 (hard cap, no increase)

Managed policy size

6,144 characters (JSON)

Customer managed policies per account

1,500 (can increase to 5,000)

Design Rules¶

  1. Individual service-level templates still exist — they are reused by other groups that don’t hit the limit

  2. Combined policies are standalone templates — the generator treats them like any other policy, no special merge logic

  3. Combined policies are group-specific — named for the group they serve (e.g., ops-services-read-only)

  4. Only combine when forced — if a group is within the 10-policy limit, use individual service-level policies

ops-services-read-only¶

Purpose: Consolidates 5 service-level read-only policies into a single managed policy for the operations_support group.

Replaces:

  • s3: level1 (read-only)

  • ecr: level1 (read-only)

  • sagemaker: level1 (read-only)

  • lambda: level1 (invoke-only)

  • bedrock: level1 (invoke-only)

Policy Budget (operations_support):

Before

Count

After

Count

AWS managed policies

7

AWS managed policies

7

Customer managed (5 individual)

5

Customer managed (1 combined)

1

Total

12 🚹

Total

8 ✅

Size Check: ~3,856 characters (JSON) — well within the 6,144 character limit.

Sids (14 total):

Service

Sids

Source Template

S3

AllowListAllBuckets, AllowReadAndVersionAccess

s3/level1-read-only

ECR

AllowECRAuth, AllowReadOnlyPullAndMetadata

ecr/level1-read-only

SageMaker

SageMakerEndpointReadOnly, CloudWatchMetricsReadOnly, AutoScalingReadOnly, ExplicitDenyInference

sagemaker/level1-read-only

Lambda

LambdaDiscoveryListActions, LambdaDiscoveryActions, LambdaInvocationActions

lambda/level1-invoke-only

Bedrock

BedrockDiscovery, BedrockStandardInference, BedrockConverseInference

bedrock/level1-invoke-only

Template Location: policies/templates/combined/ops-services-read-only.yaml

Config Usage:

operations_support:
  managed_policies:
    - CloudWatchReadOnlyAccess
    - CloudWatchLogsReadOnlyAccess
    - AWSXrayReadOnlyAccess
    - AWSKeyManagementServiceReadOnlyAccess
    - ServiceQuotasReadOnlyAccess
    - IAMReadOnlyAccess
    - AmazonSNSReadOnlyAccess
  policy_assignments:
    combined: ops-services-read-only

Maintenance Note: If any of the 5 source service-level templates change (e.g., a new action added to S3 level1), the combined policy must be updated manually to stay in sync. This is an accepted trade-off — operations_support read-only policies change infrequently.

mlops-services-a / b / c¶

Purpose: Consolidates 6 service-level deployment policies into 3 combined policies for the mlops_engineers group. Split into 3 because the 6 services combined exceed the 6,144 character managed policy size limit.

Split:

Policy

Services

Chars

Sids

mlops-services-a

S3 level2, ECR level3, Pipeline level3, SageMaker level3

~5,069

15

mlops-services-b

Lambda level2

~3,455

8

mlops-services-c

Bedrock level2

~2,896

8

Replaces:

  • s3: level2 (project-buckets-only)

  • ecr: level3 (ci-read-write)

  • pipeline: level3 (project-ci)

  • sagemaker: level3 (prod-invoke)

  • lambda: level2 (deploy-manage)

  • bedrock: level2 (model-manage)

Policy Budget (mlops_engineers):

Before

Count

After

Count

AWS managed policies

4

AWS managed policies

4

Customer managed (6 individual)

6

Customer managed (3 combined)

3

Total

10 ⚠

Total

7 ✅

Template Locations:

  • policies/templates/combined/mlops-services-a.yaml

  • policies/templates/combined/mlops-services-b.yaml

  • policies/templates/combined/mlops-services-c.yaml

Config Usage:

mlops_engineers:
  managed_policies:
    - AmazonECS_FullAccess
    - AWSCodeDeployFullAccess
    - AWSServiceCatalogEndUserFullAccess
    - CloudWatchLogsReadOnlyAccess
  policy_assignments:
    combined_a: mlops-services-a
    combined_b: mlops-services-b
    combined_c: mlops-services-c

Why 3 policies instead of 1? The 6 services combined produce ~11,320 characters of JSON — nearly double the 6,144 character managed policy size limit. Lambda level2 (3,455 chars) and Bedrock level2 (2,896 chars) are each too large to combine with the other 4 services, so each gets its own policy.

Maintenance Note: If any of the 6 source service-level templates change, the corresponding combined policy must be updated manually. Review when any source template is modified.


Assignment Recommendations¶

Typical Team Structure¶

Storage & Pipeline¶

Role

S3

ECR

Pipeline

Junior Data Scientist

read-only

read-only

read-only

Data Scientist

project-buckets-only

read-only

read-only

Senior Data Scientist

project-buckets-full

read-only

read-only

ML Engineer

project-buckets-full

dev-read-write

project-dev

MLOps Engineer

full

full

full

Backend Developer

-

read-only

-

Auditor / Compliance

read-only

read-only

read-only

Model Risk Manager

read-only

read-only

read-only

Executive / Stakeholder

-

-

read-only

CI/CD Pipeline (Role)

project-buckets-only

ci-read-write

project-ci

Platform Admin

full

full

full

Inference¶

Role

SageMaker

Lambda

Bedrock

Junior Data Scientist

read-only

-

invoke-only

Data Scientist

dev-invoke

-

invoke-only

Senior Data Scientist

dev-invoke

-

invoke-only

ML Engineer

dev-invoke

deploy-manage

model-manage

MLOps Engineer

full

full

full

Backend Developer

prod-invoke

-

invoke-only

Auditor / Compliance

read-only

-

invoke-only

Model Risk Manager

read-only

-

invoke-only

Executive / Stakeholder

-

-

invoke-only

CI/CD Pipeline (Role)

deploy-only

deploy-manage

invoke-only

Platform Admin

full

full

full

Assignment Best Practices¶

  1. Start Minimal - Begin with read-only, expand based on actual needs

  2. Time-Bound Elevation - Grant temporary full access for specific tasks, then revoke

  3. Project Isolation - Use project-only levels to prevent cross-team interference

  4. Separate Humans from Automation - Use dev-read-write for users, ci-read-write for roles

  5. Regular Reviews - Audit access quarterly, remove unused permissions


Troubleshooting¶

Common AccessDenied Scenarios¶

“Access Denied when uploading to S3”¶

Error:

An error occurred (AccessDenied) when calling the PutObject operation

Cause: You have read-only access

Solution: Request project-buckets-only or higher


“Access Denied when deleting S3 objects”¶

Error:

An error occurred (AccessDenied) when calling the DeleteObject operation

Cause: You have project-buckets-only (no delete permission)

Solution: Request project-buckets-full access


“Access Denied when pushing to ECR”¶

Error:

denied: User: arn:aws:iam::123456789012:user/john is not authorized to perform: ecr:PutImage

Cause: You have read-only ECR access

Solution: Request dev-read-write access (if you’re a human) or ci-read-write (if you’re a CI/CD pipeline)


“Access Denied when invoking SageMaker endpoint”¶

Error:

An error occurred (AccessDeniedException) when calling the InvokeEndpoint operation

Cause: You don’t have inference policy, or endpoint is in a different environment

Solution:

  • For production endpoints: Request prod-invoke access

  • For dev/staging endpoints: Request dev-invoke access


“Cannot authenticate to ECR”¶

Error:

Error response from daemon: Get https://123456789012.dkr.ecr.us-east-1.amazonaws.com/v2/: no basic auth credentials

Cause: Missing GetAuthorizationToken permission or expired token

Solution:

  1. Verify you have any ECR policy level (all include GetAuthorizationToken)

  2. Re-run: aws ecr get-login-password | docker login ...

  3. Check AWS credentials are valid: aws sts get-caller-identity


“Access Denied when starting or stopping a pipeline”¶

Error:

An error occurred (AccessDeniedException) when calling the StartPipelineExecution operation

Cause: You have read-only Pipeline access

Solution: Request project-dev (for development pipelines) or project-ci (for CI/CD roles)


“Access Denied when invoking a Lambda function”¶

Error:

An error occurred (AccessDeniedException) when calling the Invoke operation

Cause: You don’t have Lambda inference access, or the function name doesn’t match your policy’s resource scope

Solution: Request Lambda deploy-manage access. Verify the function follows the {company_prefix}-{env}-* naming convention.


“Access Denied when deleting a Lambda function”¶

Error:

An error occurred (AccessDeniedException) when calling the DeleteFunction operation

Cause: You have deploy-manage (Level 2) which explicitly denies delete operations

Solution: Delete operations require Lambda full (Level 3). Contact your platform administrator.


“Access Denied when invoking a Bedrock foundation model”¶

Error:

An error occurred (AccessDeniedException) when calling the InvokeModel operation

Cause: You don’t have Bedrock inference access, or the model hasn’t been enabled for the account

Solution:

  1. Verify you have at least invoke-only access

  2. Check that the model is enabled: someone with model-manage access must accept the model agreement first


“Access Denied when creating or deleting a Bedrock guardrail”¶

Error:

An error occurred (AccessDeniedException) when calling the CreateGuardrail operation

Cause: You have invoke-only (Level 1) which doesn’t include guardrail management

Solution:

  • To create/update guardrails: Request model-manage access

  • To delete guardrails: Request full access (Level 3) — Level 2 explicitly denies delete operations


“Access Denied when creating provisioned throughput in Bedrock”¶

Error:

An error occurred (AccessDeniedException) when calling the CreateProvisionedModelThroughput operation

Cause: Provisioned throughput is a cost-impacting operation reserved for Level 3

Solution: Request Bedrock full access. This is typically restricted to platform admins and FinOps engineers.


“Access Denied when passing a role (PassRole)”¶

Error:

An error occurred (AccessDenied) when calling the CreateFunction operation: User is not authorized to perform: iam:PassRole

Cause: Either your policy doesn’t include PassRole, or the role ARN doesn’t match the {company_prefix}-{env}-*-role-* pattern

Solution:

  1. Verify the role follows the naming convention: {company_prefix}-{env}-*-role-*

  2. Verify the PassRole condition matches the target service (e.g., lambda.amazonaws.com, bedrock.amazonaws.com)

  3. If the role name is correct, request the appropriate access level that includes PassRole


“Action explicitly denied despite having Allow permissions”¶

Error:

An error occurred (AccessDeniedException) when calling the DeleteGuardrail operation: User is not authorized to perform: bedrock:DeleteGuardrail with an explicit deny

Cause: Your policy level includes an explicit Deny statement that overrides any Allow. Lambda deploy-manage (Level 2) and Bedrock model-manage (Level 2) both include Deny blocks for destructive actions.

Solution: Explicit Deny cannot be overridden by Allow — this is by design. You need the full (Level 3) policy which removes the Deny block. Contact your platform administrator.


Security Best Practices¶

1. Principle of Least Privilege¶

Do:

  • ✅ Start with read-only access

  • ✅ Grant write access only when needed

  • ✅ Use project-only scopes when possible

  • ✅ Limit production access to specific roles

Don’t:

  • ❌ Give everyone full access “just in case”

  • ❌ Use all-environments when production-only suffices

  • ❌ Grant delete permissions without justification


2. Separation of Duties¶

Do:

  • ✅ Assign dev-read-write to IAM users (humans)

  • ✅ Assign ci-read-write to IAM roles (automation)

  • ✅ Keep development and production access separate

  • ✅ Require different people for deployment approval

Don’t:

  • ❌ Use the same credentials for humans and CI/CD

  • ❌ Give developers direct production write access

  • ❌ Allow automated systems to have full admin rights


3. Audit and Monitoring¶

Do:

  • ✅ Enable CloudTrail logging (always on)

  • ✅ Review access logs quarterly

  • ✅ Set up alerts for sensitive actions (DeleteBucket, DeleteRepository)

  • ✅ Monitor for unusual access patterns

Don’t:

  • ❌ Ignore CloudTrail logs

  • ❌ Share IAM credentials between team members

  • ❌ Disable logging to “improve performance”


4. Credential Management¶

Do:

  • ✅ Use IAM roles for EC2/ECS/Lambda (no hardcoded keys)

  • ✅ Rotate access keys every 90 days

  • ✅ Use temporary credentials (STS AssumeRole) when possible

  • ✅ Store secrets in AWS Secrets Manager, not code

Don’t:

  • ❌ Hardcode AWS credentials in code or Docker images

  • ❌ Commit credentials to Git repositories

  • ❌ Share access keys via email or Slack

  • ❌ Use root account credentials for daily work


5. Environment Isolation¶

Do:

  • ✅ Use separate AWS accounts for dev/staging/prod (ideal)

  • ✅ Use resource naming conventions (acme-mlops-dev-, acme-mlops-prod-)

  • ✅ Restrict production access to specific IAM principals

  • ✅ Require MFA for production access

Don’t:

  • ❌ Mix dev and prod resources in the same bucket/repository

  • ❌ Allow dev pipelines to access prod endpoints

  • ❌ Use the same IAM role across all environments


6. Explicit Deny for Destructive Actions¶

Do:

  • ✅ Use Deny blocks at intermediate levels (Level 2) to prevent accidental deletion

  • ✅ Reserve delete operations for full (Level 3) principals only

  • ✅ Include all service-specific delete actions in the Deny block (not just the obvious ones)

  • ✅ Document which Deny block is active at each level so users understand why Allow doesn’t work

Don’t:

  • ❌ Rely on “absence of Allow” as a safety mechanism — explicit Deny is stronger

  • ❌ Add Deny blocks at Level 3 (full) — defeats the purpose of full access

  • ❌ Forget that explicit Deny overrides any Allow, even from other attached policies


7. AI/ML Service Governance¶

Do:

  • ✅ Scope Bedrock model access using config-driven allowed_models lists per tier

  • ✅ Enforce guardrails on all production inference workloads before launch

  • ✅ Restrict provisioned throughput creation to FinOps-approved principals (Level 3 only)

  • ✅ Scope PassRole to tenant-prefixed roles ({company_prefix}-{env}-*-role-*) with service conditions

  • ✅ Use separate policy levels for model invocation vs model management

Don’t:

  • ❌ Grant bedrock:* to non-admin roles — provisioned throughput can incur significant cost

  • ❌ Allow unrestricted PassRole — this is the most common privilege escalation vector

  • ❌ Skip guardrail configuration for production Bedrock workloads

  • ❌ Let automation roles manage model access agreements — keep that as a human decision


Getting Help¶

Request Access Changes¶

Contact your MLOps platform administrator with:

  1. Current access level - What you have now

  2. Requested access level - What you need

  3. Justification - Why you need it (specific use case)

  4. Duration - Permanent or temporary (e.g., 2 weeks for project)

Report Security Issues¶

If you discover:

  • Overly permissive policies

  • Credentials in code or logs

  • Unauthorized access attempts

  • Compliance violations

Contact: security@your-company.com (replace with your security team contact)


Appendix: Policy Type Summary¶

S3 Policies¶

  • read-only - Safe exploration, no modifications

  • project-buckets-only - Standard work, no deletion

  • project-buckets-full - Senior users, cleanup capability

  • full - Platform admins only

ECR Policies¶

  • read-only - Pull images for local testing

  • dev-read-write - Humans pushing images manually

  • ci-read-write - Automation pushing images

  • full - Repository management

Pipeline Policies¶

  • read-only - View pipelines, logs, history (governance/audit)

  • project-dev - Humans creating/running pipelines (IAM users)

  • project-ci - Automation creating/running pipelines (IAM roles)

  • full - Platform-wide management

Inference Policies¶

SageMaker¶

  • read-only - View endpoint status/config (no invoke, no cost)

  • dev-invoke - Invoke dev/staging endpoints for testing

  • prod-invoke - Invoke production endpoints only

  • full - Complete endpoint lifecycle management

Lambda¶

  • read-only - View function config/status (no invoke)

  • deploy-manage - Deploy, update, and invoke functions (no delete)

  • full - Complete function lifecycle management

Bedrock¶

  • invoke-only - Call foundation models and list available models

  • model-manage - Manage model access, guardrails, imports, cross-region inference (no delete, no throughput)

  • full - Complete Bedrock platform management including provisioned throughput


Document Version: 1.0
Last Updated: 2024
Maintained By: MLOps Platform Team