Policy Guide¶

Table of Contents¶

Overview
Quick Reference
S3 Policies
ECR Policies
Pipeline Policies
Inference Policies
KMS Policies
Trusted Advisor Policies
Combined Policies
Assignment Recommendations
Troubleshooting
Security Best Practices
Getting Help
Appendix: Policy Type Summary

Overview¶

This guide helps you understand and choose the right IAM policy access levels for your MLOps team. Each policy type (S3, ECR, Pipeline, Inference) offers multiple access levels designed around real-world use cases and security best practices.

Key Principles:

Least Privilege - Start with minimal access, expand only when needed
Separation of Duties - Different access for humans vs automation
Environment Isolation - Production access requires explicit permission
Audit Trail - All actions are CloudTrail-logged for compliance

Quick Reference¶

Policy Type	Access Levels	Typical Users
S3	read-only, project-buckets-only, project-buckets-full, full	Data Scientists, ML Engineers, Admins
ECR	read-only, dev-read-write, ci-read-write, full	Developers, CI/CD Pipelines, DevOps
Pipeline	read-only, project-dev, project-ci, full	ML Engineers, MLOps Admins, Auditors, CI/CD
Inference	read-only, read-only-invoke, dev-invoke, prod-invoke, full, deploy-only	Data Scientists, Backend Developers, Business Consumers, MLOps, CI/CD

S3 Policies¶

Your S3 bucket structure follows MLOps best practices with 130+ organized folders for datasets, models, artifacts, and logs.

Level 1: read-only¶

Purpose: This IAM policy grants a user basic read-only and discovery access to his/her S3 environment, but it restricts object-level interaction to specific buckets matching a naming pattern.

Typical Users:

Junior data scientists
Business analysts
Auditors and compliance reviewers
External consultants (read-only access)

What You Can Do:

✅ Discover all buckets
✅ View bucket metadata
✅ List objects in specific buckets
✅ Read files and history
✅ See version history

"Sid": "AllowListAllBuckets"

Policy Action	Description
Discover all buckets	See a list of every S3 bucket in your AWS account via the console or CLI (`s3:ListAllMyBuckets`)
View bucket metadata	Retrieve the AWS region where any bucket is located (`s3:GetBucketLocation`), which is often required for the S3 Console to function correctly.

"Sid": "AllowReadAndVersionAccess"

Policy Action	Description
List objects in specific buckets	See the files and folders inside buckets that match the pattern `arn:aws:s3:::{company_prefix}-{env}-{tenant_id}-*`.
Read files and history	Download or view the content of objects and their historical versions (if versioning is enabled) within those specific matching buckets (`s3:GetObject, s3:GetObjectVersion`).
See version history	List the different versions of files within the allowed buckets (`s3:ListBucketVersions`).

What You Cannot Do:

❌ No modifications
❌ No permission changes
❌ No access to other buckets’ content
❌ No administrative tasks

Policy Action	Description
No modifications	Perform any “write” actions, such as uploading files (`s3:PutObject`), deleting files (`s3:DeleteObject`), or creating new buckets (`s3:CreateBucket`).
No permission changes	Modify bucket policies or Access Control Lists (ACLs) to change who else can access the data.
No access to other buckets’ content	While they can see the names of all buckets in the account, they cannot see the files inside or download anything from any bucket that doesn’t match the `{company_prefix}-{env}-{tenant_id}-*` prefix.
No administrative tasks	Cannot empty buckets, change lifecycle rules, or modify bucket settings like encryption or logging.

Example Scenario:

Sarah is a new data scientist who needs to explore existing datasets and model artifacts to understand the current ML pipeline. She doesn’t need to upload anything yet, just learn the landscape.

Sample Permissions:

[
  {
    "Sid": "AllowListAllBuckets",
    "Effect": "Allow",
    "Action": [
      "s3:ListAllMyBuckets",
      "s3:GetBucketLocation"
    ],
    "Resource": "arn:aws:s3:::*"
  },
  {
    "Sid": "AllowReadAndVersionAccess",
    "Effect": "Allow",
    "Action": [
      "s3:GetObject",
      "s3:GetObjectVersion",
      "s3:ListBucket",
      "s3:ListBucketVersions"
    ],
    "Resource": [
      "arn:aws:s3:::edge-prod-b001-*",
      "arn:aws:s3:::edge-prod-b001-*/*"
    ]
  }
]

Note: ListAllMyBuckets cannot be scoped to specific buckets (AWS limitation). Users will see bucket names across the account but can only read objects from their own tenant’s buckets.

Level 2: project-buckets-only¶

Purpose: This IAM policy implements policy that allows standard data science and ML engineering workflows while strictly preventing deletions and bucket-level changes. This policy uses an explicit allow for specific operations and relies on the absence of delete permissions to enforce your “Cannot Do” rules.

Typical Users:

Data scientists (standard access)
ML engineers (development work)
Automated training jobs
Experimentation workflows

What You Can Do:

✅ Read-Only Everything
✅ Upload & Overwrite
✅ Create Folders
✅ Modify Object Tags

"Sid": "ListBucketsAndLocation"

Applied to all S3 resources (*), these allow the user to see the “big picture” in the AWS Console or via CLI.

Policy Action	Description
s3:ListAllMyBuckets	Allows the user to list the names of all buckets owned by the AWS account.
s3:GetBucketLocation	Allows the user to see which AWS Region (e.g., us-east-1) a specific bucket resides in.
`"Sid": "BucketLevelReadAndList"`

These actions apply to the bucket itself, rather than the files inside it.

Policy Action	Description
s3:ListBucket	Allows the user to list the objects (files and folders) inside the bucket.
s3:GetBucketVersioning	Allows the user to check if the bucket has Versioning enabled (which keeps a history of object changes).

"Sid": "ObjectLevelReadWriteAndTagging"

These actions allow the user to manage the actual data and metadata within the buckets.

Policy Action	Description
s3:GetObject	Allows the user to download or read a file.
s3:GetObjectVersion	Allows the user to retrieve a specific historical version of a file (if versioning is on).
s3:PutObject	Allows the user to upload new files or update existing ones.
s3:PutObjectTagging	Allows the user to add or change “tags” (key-value pairs used for organization or billing) on an object.
s3:GetObjectTagging	Allows the user to view the tags currently assigned to an object.

What You Cannot Do:

Policy Action	Description
Delete anything	There are no s3:DeleteObject or s3:DeleteBucket permissions in the policy.
Manage Permissions	The user cannot change or view Access Control Lists (ACLs) or bucket policies (no `s3:PutBucketPolicy`, `s3:GetBucketAcl`, etc.).
Access other Buckets	The user can only list/read/write to buckets starting with the prefix `edge-prod-b001-*`. Any other bucket is restricted.
Modify Bucket Settings	Aside from viewing versioning, the user cannot change bucket configurations like Encryption, Logging, or Lifecycle rules.
Perform Administrative Tasks	The cannot create new buckets or delete existing ones.
Object Permanent Deletion	Even though a user can “Put” objects, he/she cannot remove them or manage object versions beyond reading them.
Manage Lifecycle or Encryption	Can not set up data archiving (Glacier), expiration rules, or modify server-side encryption settings.
CORS or Website Config	A user lacks permissions to configure the buckets for static website hosting or cross-origin resource sharing.

Example Scenario:

Marcus is training models and needs to upload preprocessed datasets to raw-data/project-x/ and save model artifacts to models/project-x/. He can overwrite files during iterative development but cannot accidentally delete the team’s shared datasets.

Sample Permissions:

[
  {
    "Sid": "ListBucketsAndLocation",
    "Effect": "Allow",
    "Action": [
      "s3:ListAllMyBuckets",
      "s3:GetBucketLocation"
    ],
    "Resource": "arn:aws:s3:::*"
  },
  {
    "Sid": "BucketLevelReadAndList",
    "Effect": "Allow",
    "Action": [
      "s3:ListBucket",
      "s3:GetBucketVersioning"
    ],
    "Resource": "arn:aws:s3:::edge-prod-b001-*"
  },
  {
    "Sid": "ObjectLevelReadWriteAndTagging",
    "Effect": "Allow",
    "Action": [
      "s3:GetObject",
      "s3:GetObjectVersion",
      "s3:PutObject",
      "s3:PutObjectTagging",
      "s3:GetObjectTagging"
    ],
    "Resource": [
      "arn:aws:s3:::edge-prod-b001-*",
      "arn:aws:s3:::edge-prod-b001-*/*"
    ]
  }
]

Level 3: project-buckets-full¶

Purpose: The following IAM policy provides “Full Object Access” for your project buckets. It allows senior data scientists and leads to manage all data (including deleting objects and versions) while strictly preventing any changes to the bucket’s structure or configuration.

Typical Users:

Senior data scientists
ML team leads
Project managers (data cleanup)
Cost optimization roles

What You Can Do:

✅ Global Actions
✅ Bucket-Level Access
✅ Object-Level Management

"Sid": "AllowListAllBuckets"

Provides global visibility to see that S3 buckets exist and where they are located.

Policy Action	Description
s3:ListAllMyBuckets	Allows the user to list all buckets in the AWS account (required for viewing buckets in the AWS Console).
s3:GetBucketLocation	Allows the user to see the specific AWS Region where a bucket is hosted.

"Sid": "BucketLevelReadAndList"

Allows the user to see what is inside specific buckets (those starting with edge-prod-b001-).

Policy Action	Description
s3:ListBucket	Allows the user to list the objects (files) within a bucket.
s3:ListBucketVersions	Allows the user to list all versions of every object in the bucket.
s3:GetBucketVersioning	Allows the user to check if the bucket has versioning enabled or suspended.

"Sid": "ObjectLevelFullManagement"

Grants full control over the lifecycle and metadata of files within the edge-prod-b001- buckets.

Policy Action	Description
s3:GetObject	Allows reading/downloading a file.
s3:GetObjectVersion	Allows downloading a specific historical version of a file.
s3:PutObject	Allows uploading new files or updating existing ones.
s3:DeleteObject	Allows removing the current version of a file.
s3:DeleteObjectVersion	Allows permanently deleting a specific historical version of a file.
s3:PutObjectTagging	Allows adding or updating key-value tags on a file (often used for cost tracking or access control).
s3:GetObjectTagging	Allows viewing the tags associated with a file.
s3:AbortMultipartUpload	Allows canceling a large file upload that is currently in progress, which cleans up temporary storage parts.

What You Cannot Do:

❌ Read from or write to other buckets
❌ Administrative changes
❌ Bucket Creation or Deletion
❌ Permanent Deletions (MFA)
❌ Permissions Management
❌ Account-wide S3 Features

Policy Action	Description
Read from or write to other buckets	While a user can see the names of all buckets in the account, he/she cannot list the contents or download files from any bucket that doesn’t start with `edge-prod-b001-`.
Administrative changes	A user cannot delete buckets, change bucket policies, or modify encryption settings (policy does not include actions like `s3:DeleteBucket` or `s3:PutBucketPolicy`).
Bucket Creation or Deletion	There are no permissions in the policy to create a brand-new bucket or delete an existing one
Permanent Deletions (MFA)	If MFA delete is enabled on a bucket, a user wouldn’t be able to permanently purge versions without an MFA token.
Permissions Management	A user cannot grant other people access to these files because `s3:PutObjectAcl` or `s3:PutBucketAcl` are not included in the policy.
Account-wide S3 Features	This policy does not allow the user to manage account-level features like S3 Access Points, S3 Object Lambda, or S3 Batch Operations.

Example Scenario:

Elena is a senior ML engineer managing a project that generated 500GB of failed experiment artifacts over 6 months. She needs to delete these to reduce S3 costs while keeping successful model artifacts intact.

Why This Level Exists: The S3 provisioner creates buckets with versioning disabled by default to save costs. However, customers can enable versioning. This level includes DeleteObjectVersion so senior team members can clean up versions if needed, without requiring full admin access.

Sample Permissions:

[
  {
    "Sid": "AllowListAllBuckets",
    "Effect": "Allow",
    "Action": [
      "s3:ListAllMyBuckets",
      "s3:GetBucketLocation"
    ],
    "Resource": "arn:aws:s3:::*"
  },
  {
    "Sid": "BucketLevelReadAndList",
    "Effect": "Allow",
    "Action": [
      "s3:ListBucket",
      "s3:ListBucketVersions",
      "s3:GetBucketVersioning"
    ],
    "Resource": "arn:aws:s3:::edge-prod-b001-*"
  },
  {
    "Sid": "ObjectLevelFullManagement",
    "Effect": "Allow",
    "Action": [
      "s3:GetObject",
      "s3:GetObjectVersion",
      "s3:PutObject",
      "s3:DeleteObject",
      "s3:DeleteObjectVersion",
      "s3:PutObjectTagging",
      "s3:GetObjectTagging",
      "s3:AbortMultipartUpload"
    ],
    "Resource": [
      "arn:aws:s3:::edge-prod-b001-*",
      "arn:aws:s3:::edge-prod-b001-*/*"
    ]
  }
]

Level 4: full¶

Purpose: This policy allows the identified administrators and engineers to perform all S3 operations, including high-level bucket management like modifying policies, versioning, lifecycle rules, replication, and bucket deletion.

Typical Users:

MLOps platform administrators
DevOps engineers
Infrastructure team
Break-glass emergency access

What You Can Do:

This is a wildcard permission that covers all 100+ S3 operations.

✅ Manage Buckets: Create new buckets, delete existing ones, and change bucket regions.
✅ Manage Objects: Upload, download, copy, and permanently delete files (objects).
✅ Control Security: Modify Bucket Policies, Access Control Lists (ACLs), and Public Access Block settings, potentially making data public.
✅ Configure Features: Set up lifecycle rules (like auto-archiving to Glacier), enable versioning, configure replication, and manage encryption settings.
✅ Account-Level Tasks: View storage inventory, analytics, and metrics for the entire S3 service.

What You Cannot Do:

❌ Nothing - this is full S3 access within your environment

Example Scenario:

James is the MLOps platform owner who needs to configure S3 lifecycle policies to automatically archive old model artifacts to Glacier after 90 days, reducing storage costs by 70%.

Security Note: ⚠️ This level should be assigned sparingly. Most users need project-buckets-only or project-buckets-full.

Sample Permissions:

[
  {
    "Sid": "S3FullAccessPermissions",
    "Effect": "Allow",
    "Action": "s3:*",
    "Resource": [
      "arn:aws:s3:::*",
      "arn:aws:s3:::*/*"
    ]
  }
]

ECR Policies¶

ECR (Elastic Container Registry) stores your Docker images for ML training, inference, and pipeline components.

Enterprise Compliance Model¶

ECR policies follow a 4-level model that separates human access from automation access — critical for regulated industries (FinTech, Healthcare, Government) where compliance requires separation of duties.

Level Overview:

Level	Name	Who	Purpose
1	read-only	Runtime environments	Pull images — pure consumers
2	dev-read-write	Data scientists (humans)	Push/pull, create repos, trigger scans interactively
3	ci-read-write	CI/CD pipelines (automation)	Push/pull, create repos, validate lifecycle rules
4	full	MLOps administrators	Complete registry management including deletion

Key Distinction — Level 2 vs Level 3:

dev-read-write → Assigned to IAM users (humans)
ci-read-write → Assigned to IAM roles (CI/CD automation)

While both levels share a core set of push/pull/discovery actions, they are shaped by who uses them:

Level 2 includes ecr:StartImageScan and ecr:DescribeImageScanFindings — humans trigger and review scans interactively
Level 3 includes ecr:GetLifecyclePolicyPreview — CI pipelines validate lifecycle rules as part of infrastructure automation
Level 3 omits scan actions because CI relies on ECR’s scan-on-push setting

This separation ensures audit trails clearly show human vs automated actions, satisfying SOC2, HIPAA, and PCI-DSS requirements.

Level 1: read-only¶

Purpose: Pull images for local development and testing

Typical Users:

Data scientists (local testing)
QA engineers
Security scanners
Developers onboarding to the platform

What You Can Do:

✅ Authenticate
✅ Pull Images
✅ View Metadata
✅ Check Security

"Sid": "AllowECRAuth"

Grants the basic permission required to authenticate with Amazon ECR. This is the “handshake” step needed before any other ECR actions can be performed.

Policy Action	Description
ecr:GetAuthorizationToken	Allows the user to retrieve an encrypted authorization token. This token is used with the Docker CLI (via aws ecr get-login-password) to authenticate your local environment to the registry.

"Sid": "AllowReadOnlyPullAndMetadata"

Provides “Read-Only” access to ECR. It allows users to view repository details and pull (download) images, but does not allow them to upload (push), delete, or modify anything.

Policy Action	Description
ecr:BatchCheckLayerAvailability	Allows the user to check if the specific “layers” that make up a Docker image already exist in the repository.
ecr:GetDownloadUrlForLayer	Provides a URL to download a specific image layer; this is a background action required for the docker pull command to function.
ecr:BatchGetImage	Allows the user to retrieve the detailed information (manifests) for a specific set of images to facilitate downloading them.
ecr:DescribeRepositories	Allows the user to see a list of repositories within the registry and view their settings.
ecr:ListImages	Allows the user to view a list of all image tags and digests within a specific repository.
ecr:DescribeImages	Provides detailed metadata about specific images, such as the size, push date, and associated tags.
ecr:DescribeImageScanFindings	Allows the user to view the results of vulnerability scans performed on the images.

What You Cannot Do:

❌ Upload Images
❌ Delete Content
❌ Modify Settings

Policy Action	Description
Upload Images	A user can not push new images or layers (actions not present in the policy: `ecr:PutImage`, `ecr:InitiateLayerUpload`, etc.).
Delete Content	A user can not delete images, tags or repositories (actions `ecr:BatchDeleteImage` and `ecr:DeleteRepository` are not included in the policy
Modify Settings	A user can not create repositories, change permissions, or update lifecycle policies (actions `ecr:CreateRepository` and `ecr:SetRepositoryPolicy` are not included in the policy

Example Scenario:

Priya is a data scientist who needs to pull the team’s base ML training image (acme-mlops-dev/ml-training:v2.1) to run experiments locally on her laptop.

Sample Permissions:

[
  {
    "Sid": "AllowECRAuth",
    "Effect": "Allow",
    "Action": [
      "ecr:GetAuthorizationToken"
    ],
    "Resource": "*"
  },
  {
    "Sid": "AllowReadOnlyPullAndMetadata",
    "Effect": "Allow",
    "Action": [
      "ecr:BatchCheckLayerAvailability",
      "ecr:GetDownloadUrlForLayer",
      "ecr:BatchGetImage",
      "ecr:DescribeRepositories",
      "ecr:ListImages",
      "ecr:DescribeImages",
      "ecr:DescribeImageScanFindings"
    ],
    "Resource": "arn:aws:ecr:*:*:repository/${company_prefix}-${env}/*"
  }
]

Docker Command:

This command sequence authenticates your local Docker client with a private Amazon Elastic Container Registry (ECR) and then downloads a specific container image to your machine. Together, these commands ensure you have the necessary permissions to access a private AWS repository and download a machine learning training image (ml-training:v2.1) used in your MLOps development environment. The authorization token provided by AWS is valid for 12 hours, after which you must run the login command again.

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 123456789012.dkr.ecr.us-east-1.amazonaws.com

docker pull 123456789012.dkr.ecr.us-east-1.amazonaws.com/acme-mlops-dev/ml-training:v2.1

Explanation:

Part 1: Authentication

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 123456789012.dkr.ecr.us-east-1.amazonaws.com 

aws ecr get-login-password: This AWS CLI command retrieves a temporary base64-encoded authorization token.
–region us-east-1: Specifies the AWS Region where your registry is hosted.
| (The Pipe): This takes the password generated by the first command and passes it directly as input to the next command.
docker login: Initializes the login process for a Docker registry.
–username AWS: For Amazon ECR, the username is always AWS.
–password-stdin: Tells Docker to read the password from the “standard input” (the pipe), which is more secure than typing it out.
123456789012.dkr.ecr.us-east-1.amazonaws.com: This is the unique URI for your private registry. It follows the format <account_id>.dkr.ecr.<region>.amazonaws.com.

Part 2: Pulling the Image

docker pull 123456789012.dkr.ecr.us-east-1.amazonaws.com/acme-mlops-dev/ml-training:v2.1

docker pull: The standard command to download an image from a registry.
acme-mlops-dev/ml-training: The name of the specific repository within your ECR registry.
:v2.1: The specific version tag of the image you want to download.

Level 2: dev-read-write¶

Purpose: This JSON policy grants the necessary permissions for developers to build, tag, and push images while maintaining read-only visibility and the ability to create new repositories. It explicitly excludes destructive or administrative actions like deleting repositories, modifying policies, or changing lifecycles.

Typical Users:

ML engineers (manual image builds)
DevOps engineers (troubleshooting)
Platform developers (base image maintenance)

Assignment: IAM Users only (not roles)

What You Can Do:

✅ Everything in read-only, PLUS:
✅ Push new Docker images
✅ Tag images
✅ Create new repositories
✅ Initiate image scans

"Sid": "ReadOnlyAndDiscovery"

Policy Action	Description
ecr:GetAuthorizationToken	Obtain a temporary password to authenticate a Docker CLI to the registry.
ecr:DescribeRepositories	View metadata about existing repositories (e.g., URI, creation date).
ecr:DescribeImages	View metadata about images within a repository (e.g., push date, size, tags).
ecr:ListImages	Get a list of all image IDs in a repository.
ecr:BatchGetImage	Pull/download image manifest information for one or more images.
ecr:GetRepositoryPolicy	View the JSON resource-level policy attached to a repository.
ecr:GetLifecyclePolicy	View the rules that automatically clean up old images.
ecr:ListTagsForResource	View the tags (key-value pairs) assigned to an ECR resource.
ecr:DescribeImageScanFindings	View the security vulnerability reports for scanned images.

"Sid": "PushAndTagImages"

Policy Action	Description
ecr:BatchCheckLayerAvailability	Check if specific image layers already exist in the registry (used during push).
ecr:GetDownloadUrlForLayer	Retrieve a URL to download specific image layers.
ecr:InitiateLayerUpload	The first step of the process required to upload new image layers to a repository.
ecr:UploadLayerPart	The second step of the process required to upload new image layers to a repository.
ecr:CompleteLayerUpload	The third step of the process required to upload new image layers to a repository.
ecr:PutImage	Finalize the upload by adding the image manifest to the repository.
ecr:TagResource	Add or update tags (metadata) on ECR resources like repositories.

"Sid": "ManageRepositoriesAndScans"

Policy Action	Description
ecr:CreateRepository	Create entirely new, empty repositories.
ecr:StartImageScan	Manually trigger a vulnerability scan on an existing image.

What You Cannot Do:

❌ Delete images, repositories or life cycle policies
❌ Modify repository policies
❌ Change lifecycle policies

Policy Action	Description
Delete Resources	The policy lacks `ecr:DeleteRepository`, `ecr:BatchDeleteImage`, or `ecr:DeleteLifecyclePolicy`. Users cannot remove any data.
Modify Policies	There are no `PutRepositoryPolicy` or `SetRepositoryPolicy` permissions; users cannot change who has access to these resources.
Lifecycle Management	Users can view policies but cannot create or change them (`ecr:PutLifecyclePolicy`).
Public ECR	This policy applies to Private ECR. It does not grant permissions for ecr-public actions.

Example Scenario:

Tom is an ML engineer who built a new training image with updated dependencies. He needs to push it to ECR so the team can test it before integrating into the CI/CD pipeline.

Sample Permissions:

[
  {
    "Sid": "AllowECRAuth",
    "Effect": "Allow",
    "Action": [
      "ecr:GetAuthorizationToken"
    ],
    "Resource": "*"
  },
  {
    "Sid": "ReadOnlyAndDiscovery",
    "Effect": "Allow",
    "Action": [
      "ecr:DescribeRepositories",
      "ecr:DescribeImages",
      "ecr:ListImages",
      "ecr:BatchGetImage",
      "ecr:GetRepositoryPolicy",
      "ecr:GetLifecyclePolicy",
      "ecr:ListTagsForResource",
      "ecr:DescribeImageScanFindings"
    ],
    "Resource": "arn:aws:ecr:*:*:repository/${company_prefix}-${env}/*"
  },
  {
    "Sid": "PushAndTagImages",
    "Effect": "Allow",
    "Action": [
      "ecr:BatchCheckLayerAvailability",
      "ecr:GetDownloadUrlForLayer",
      "ecr:InitiateLayerUpload",
      "ecr:UploadLayerPart",
      "ecr:CompleteLayerUpload",
      "ecr:PutImage",
      "ecr:TagResource"
    ],
    "Resource": "arn:aws:ecr:*:*:repository/${company_prefix}-${env}/*"
  },
  {
    "Sid": "ManageRepositoriesAndScans",
    "Effect": "Allow",
    "Action": [
      "ecr:CreateRepository",
      "ecr:StartImageScan"
    ],
    "Resource": "arn:aws:ecr:*:*:repository/${company_prefix}-${env}/*"
  }
]

Docker Commands:

# Build and push
docker build -t ml-training:v2.2 .

docker tag ml-training:v2.2 123456789012.dkr.ecr.us-east-1.amazonaws.com/acme-mlops-dev/ml-training:v2.2

docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/acme-mlops-dev/ml-training:v2.2

Why No Delete Permission: Image cleanup should be handled by ECR Lifecycle Policies (automated, safe) rather than manual deletion (error-prone, risky).

Level 3: ci-read-write¶

Purpose: CI/CD pipelines building and pushing images automatically

Typical Users:

GitHub Actions workflows
Jenkins build jobs
CodePipeline stages
GitLab CI runners

Assignment: IAM Roles only (not users)

What You Can Do:

✅ Push images from automated builds
✅ Tag images with build metadata
✅ Create repositories on-demand
✅ Read and preview lifecycle policies (for infra-as-code validation)

How It Differs from dev-read-write:

➕ Adds ecr:GetLifecyclePolicyPreview — CI pipelines validate lifecycle rules as part of infrastructure automation
➖ Removes ecr:StartImageScan and ecr:DescribeImageScanFindings — CI relies on ECR’s scan-on-push setting rather than triggering scans directly

What You Cannot Do:

❌ No Deletion: Actions like ecr:BatchDeleteImage or ecr:DeleteRepository are not included, preventing runners from removing version history or entire projects.
❌ No Security Modification: ecr:SetRepositoryPolicy and ecr:DeleteRepositoryPolicy are excluded, ensuring the pipeline cannot change who has access to the images.
❌ No Lifecycle Changes: While the runner can read lifecycle policies, it cannot modify or delete them (ecr:PutLifecyclePolicy), ensuring automated cleanup rules remain intact.

Example Scenario:

A GitHub Actions workflow automatically builds a new training image on every merge to main, tags it with the git commit SHA, and pushes it to ECR for deployment.

Sample Permissions:

[
    {
    "Sid": "AllowECRAuth",
    "Effect": "Allow",
    "Action": [
      "ecr:GetAuthorizationToken"
    ],
    "Resource": "*"
  },
  {
    "Sid": "AllowRepositoryCreation",
    "Effect": "Allow",
    "Action": [
      "ecr:CreateRepository"
    ],
    "Resource": "arn:aws:ecr:*:*:repository/${company_prefix}-${env}/*"
  },
  {
    "Sid": "ContinuousIntegrationReadWrite",
    "Effect": "Allow",
    "Action": [
      "ecr:BatchCheckLayerAvailability",
      "ecr:GetDownloadUrlForLayer",
      "ecr:GetRepositoryPolicy",
      "ecr:DescribeRepositories",
      "ecr:ListImages",
      "ecr:DescribeImages",
      "ecr:BatchGetImage",
      "ecr:GetLifecyclePolicy",
      "ecr:GetLifecyclePolicyPreview",
      "ecr:ListTagsForResource",
      "ecr:InitiateLayerUpload",
      "ecr:UploadLayerPart",
      "ecr:CompleteLayerUpload",
      "ecr:PutImage",
      "ecr:TagResource"
    ],
    "Resource": "arn:aws:ecr:*:*:repository/${company_prefix}-${env}/*"
  }
]

"Sid": "AllowRepositoryCreation"

Policy Action	Description
ecr:CreateRepository	Allows the user to create a new, empty repository to store Docker or OCI-compliant images.

"Sid": "ContinuousIntegrationReadWrite"

Authentication

Policy Action	Description
ecr:GetAuthorizationToken	Allows the user to request a short-lived password (token) to authenticate a Docker CLI client against ECR.

Read/Pull Actions

Policy Action	Description
ecr:BatchCheckLayerAvailability	Checks if specific image layers already exist in the repository.
ecr:GetDownloadUrlForLayer	Retrieves a URL to download a specific image layer.
ecr:GetRepositoryPolicy	Allows the user to view the resource-based permissions policy of a repository.
ecr:DescribeRepositories	Returns metadata about repositories (e.g., creation date, URI, and settings).
ecr:ListImages	Lists basic information about the images stored in a repository.
ecr:DescribeImages	Provides detailed metadata about images, such as size, push date, and tags.
ecr:BatchGetImage	Allows the user to retrieve the image manifest or configuration for one or more images (required for pulling).
ecr:GetLifecyclePolicy	Retrieves the current lifecycle rules (which automate image deletion).
ecr:GetLifecyclePolicyPreview	Allows the user to see the results of a lifecycle policy before it is applied.
ecr:ListTagsForResource	Displays the tags (metadata) associated with a specific ECR repository.

Write/Push Actions

Policy Action	Description
ecr:InitiateLayerUpload	Starts the multi-step process of uploading an image layer.
ecr:UploadLayerPart	Allows the user to upload a specific segment of an image layer.
ecr:CompleteLayerUpload	Informs ECR that all parts of a layer have been uploaded and can be finalized.
ecr:PutImage	Finalizes the push process by uploading the image manifest, making the image available in the repository.
ecr:TagResource	Allows the user to add or update metadata tags on the repository itself.

GitHub Actions Example:

- name: Login to ECR
  run: 
    aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $ECR_REGISTRY

- name: Build and push
  run: |
    docker build -t $ECR_REGISTRY/acme-mlops-dev/ml-training:$GITHUB_SHA .
    docker push $ECR_REGISTRY/acme-mlops-dev/ml-training:$GITHUB_SHA

Compliance Note: In regulated industries, auditors can trace:

Human actions → IAM user CloudTrail logs (dev-read-write)
Automated actions → IAM role CloudTrail logs (ci-read-write)

This separation satisfies SOC2, HIPAA, and PCI-DSS requirements.

Level 4: full¶

This policy uses the wildcard ecr:* to grant all available permissions within the Amazon ECR service.

Purpose: Administrative access for repository management

Typical Users:

MLOps platform administrators
DevOps team leads
Security engineers (policy management)

What You Can Do:

Functionality	Policy Action	Description
✅ Delete Images and Repositories	`ecr:BatchDeleteImage`	Allows the deletion of multiple specified images within a repository.
	`ecr:DeleteRepository`	Grants the ability to permanently remove an entire repository.
✅ Modify Repository Policies	`ecr:SetRepositoryPolicy`	Allows you to apply or change resource-based policies to control who can access specific repositories.
	`ecr:DeleteRepositoryPolicy`	Enables the removal of existing repository access policies.
✅ Configure Lifecycle Policies	`ecr:PutLifecyclePolicy`	Allows creating or updating rules that automatically expire or delete old images based on age or count.
	`ecr:GetLifecyclePolicy`	Permits viewing the current automated cleanup rules.
✅ Set Up Cross-Region Replication	`ecr:PutReplicationConfiguration`	Grants permission to configure settings that automatically copy images to other AWS regions or accounts.
✅ Manage Image Scanning Settings	`ecr:PutImageScanningConfiguration`	Allows you to enable or disable automatic vulnerability scanning upon image push.
	`ecr:StartImageScan`	Permits manually triggering a security scan for a specific image.
✅ Repository and Image Management	`ecr:CreateRepository`	Allows the creation of new private repositories to store container images.
	`ecr:DescribeRepositories` and `ecr:DescribeImages`	Provides the ability to list and view metadata for all repositories and images.
	`ecr:PutImage`	Allows pushing new container images or updating existing ones.

What You Cannot Do:

❌ No Restrictions - This policy is designed for full administrative access; there are no denied actions within the ECR service scope.

Example Scenario:

Lisa is the platform administrator who needs to configure an ECR Lifecycle Policy to automatically delete untagged images after 7 days and keep only the last 10 tagged images per repository, reducing storage costs.

Security Note: ⚠️ Most users need read-only or dev-read-write. Reserve full access for platform administrators. While this policy allows all ECR actions, users still require ecr:GetAuthorizationToken (included in ecr:*) to authenticate their Docker CLI with the registry.

Sample Permissions:

[
  {
    "Sid": "FullECRAdminAccess",
    "Effect": "Allow",
    "Action": [
        "ecr:*"
    ],
    "Resource": "*"
  }
]

Pipeline Policies¶

SageMaker Pipeline policies control access to ML training and deployment workflows.

Enterprise Compliance Model¶

Pipeline policies follow a 4-level model that separates human access from automation access - mirroring the ECR pattern for consistency and compliance.

Key Distinction:

project-dev → Assigned to IAM users (humans)
project-ci → Assigned to IAM roles (CI/CD automation)

Both have identical permissions, but the assignment pattern ensures audit trails clearly show human vs automated actions.

Level 1: read-only¶

Purpose: This policy is designed for read-only governance and monitoring across CI/CD and Machine Learning workflows. It allows a user to audit the status, history, and logs of automated pipelines without the ability to create, modify, or delete resources.

The primary goal is to provide full visibility into the state of AWS CodePipeline and SageMaker Model Building Pipelines. It is ideal for auditors, project managers, or automated monitoring tools that need to track deployment progress and execution history across an entire AWS account.

Typical Users:

Auditors and compliance reviewers
Model risk managers
Executive stakeholders
New team members learning the platform
Cross-team visibility roles

What You Can Do (CI/CD pipelines):

✅ View the full architecture and configuration of any pipeline.
✅ Access the complete execution history for auditing purposes.
✅ Monitor the live progress of a running pipeline and review its logs.
✅ Examine pipeline steps and configurations (e.g., environment variables, source branches).
✅ Access and export lists of pipelines and executions for compliance reporting.

What You Cannot Do (Restrictive actions for CI/CD pipelines):

❌ Create or Modify: You cannot change the pipeline’s structure, add new stages, or delete existing ones.
❌ Start Executions: You are barred from manually triggering a new pipeline run.
❌ Stop/Cancel: You cannot intervene in an active process to stop or roll it back.
❌ Delete: You do not have the permission to remove pipeline resources or execution history.

What You Can Do (Sagemaker pipelines):

✅ List all available pipelines in the account to provide a high-level overview for audit purposes.
✅ View the history of all pipeline runs, allowing auditors to see when and how many times a workflow was triggered.
✅ Examine the individual steps within a specific execution to verify that each stage (e.g., training, processing) completed as expected.
✅ Retrieve the metadata and configuration of a pipeline definition to review its architectural design.
✅ View the current status (e.g., Succeeded, Failed) and specific details of a single execution run.
✅ Access the exact version of the pipeline definition used for a specific historical run, ensuring the “as-run” configuration is verifiable.

What You Cannot Do (Restrictive actions for Sagemaker pipelines):

❌ sagemaker:CreatePipeline: Prevent the creation of new workflows that could bypass established compliance checks.
❌ sagemaker:UpdatePipeline: Ensure that existing validated pipeline definitions remain immutable and cannot be altered.
❌ sagemaker:StartPipelineExecution: Disable the ability to trigger new runs, preventing unauthorized compute costs or production changes.
❌ sagemaker:StopPipelineExecution: Prevent users from interfering with active, ongoing production workloads.
❌ sagemaker:DeletePipeline: Protect historical audit trails and definitions from being permanently removed.

Example Scenario:

Rachel is a model risk manager who needs to review all ML training pipelines quarterly to ensure they meet compliance requirements for bias detection and data validation. She needs to see pipeline configurations and execution logs but should not be able to trigger or modify any workflows.

Sample Permissions:

[
  {
    "Sid": "CodePipelineReadOnly",
    "Effect": "Allow",
    "Action": [
      "codepipeline:GetPipeline",
      "codepipeline:GetPipelineExecution",
      "codepipeline:GetPipelineState",
      "codepipeline:ListPipelines",
      "codepipeline:ListPipelineExecutions",
      "codepipeline:ListActionTypes",
      "codepipeline:ListTagsForResource"
    ],
    "Resource": "*"
  },
  {
    "Sid": "CodeBuildReadOnly",
    "Effect": "Allow",
    "Action": [
      "codebuild:BatchGetBuilds",
      "codebuild:ListBuilds"
    ],
    "Resource": "*"
  },
  {
    "Sid": "PipelineLogsReadOnly",
    "Effect": "Allow",
    "Action": [
      "logs:GetLogEvents",
      "logs:DescribeLogStreams"
    ],
    "Resource": "*"
  },
  {
    "Sid": "SagemakerPipelineReadOnly",
    "Effect": "Allow",
    "Action": [
      "sagemaker:ListPipelines",
      "sagemaker:ListPipelineExecutions",
      "sagemaker:ListPipelineExecutionSteps",
      "sagemaker:DescribePipeline",
      "sagemaker:DescribePipelineExecution",
      "sagemaker:DescribePipelineDefinitionForExecution",
      "sagemaker:GetSearchSuggestions"
    ],
    "Resource": "*"
  }
]

"Sid": "CodePipelineReadOnly"

Grants read-only access to all AWS CodePipeline resources across the account. Uses Resource: * intentionally — auditors and governance teams need account-wide visibility across all tenants to perform compliance reviews.

Policy Action	Description
codepipeline:GetPipeline	View the detailed definition and structure of a pipeline.
codepipeline:GetPipelineExecution	View the status and details of a specific execution instance.
codepipeline:GetPipelineState	Monitor the real-time status of each stage and action within a pipeline.
codepipeline:ListPipelines	List all available pipelines in the account across all tenants.
codepipeline:ListPipelineExecutions	View the history of all past and current pipeline runs.
codepipeline:ListActionTypes	See what types of actions (e.g., Build, Deploy, Test) are available for use.
codepipeline:ListTagsForResource	Review metadata tags used for cost tracking and organizational governance.

"Sid": "CodeBuildReadOnly"

Grants read-only access to all AWS CodeBuild projects across the account. Uses Resource: * for the same governance reason — auditors need visibility into build jobs across all tenants.

Policy Action	Description
codebuild:BatchGetBuilds	View details of specific build jobs triggered by pipelines.
codebuild:ListBuilds	List all build jobs for visibility into build history across all tenants.

"Sid": "PipelineLogsReadOnly"

Grants read-only access to CloudWatch Logs for reviewing pipeline and build execution logs. Uses Resource: * because log group names are generated by AWS services at runtime and do not follow a predictable naming pattern.

Policy Action	Description
logs:GetLogEvents	Access execution logs for audit trails and compliance reviews.
logs:DescribeLogStreams	List available log streams to locate relevant log output.

"Sid": "SagemakerPipelineReadOnly"

Grants read-only access to all SageMaker Pipelines across the account. Uses Resource: * intentionally — auditors need to review ML pipeline configurations, execution history, and step-level details across all tenants for compliance verification.

Policy Action	Description
sagemaker:ListPipelines	List all available pipelines in the account to provide a high-level overview for audit purposes.
sagemaker:ListPipelineExecutions	View the history of all pipeline runs, allowing auditors to see when and how many times a workflow was triggered.
sagemaker:ListPipelineExecutionSteps	Examine the individual steps within a specific execution to verify that each stage (e.g., training, processing) completed as expected.
sagemaker:DescribePipeline	Retrieve the metadata and configuration of a pipeline definition to review its architectural design.
sagemaker:DescribePipelineExecution	View the current status (e.g., Succeeded, Failed) and specific details of a single execution run.
sagemaker:DescribePipelineDefinitionForExecution	Access the exact version of the pipeline definition used for a specific historical run, ensuring the “as-run” configuration is verifiable.
sagemaker:GetSearchSuggestions	Use autocomplete/suggestions when searching for SageMaker resources.

Resource Scope: All four Sids use Resource: * intentionally. Read-only governance access requires account-wide visibility across all tenants — auditors must be able to review any team’s pipelines, builds, and execution logs to perform compliance assessments. This is consistent with how AWS managed policies like ReadOnlyAccess and AWSCloudTrail_ReadOnlyAccess are designed.

Compliance Use Case: In regulated industries, auditors must verify that ML pipelines include required validation steps (data quality checks, bias detection, model explainability). Read-only access enables these reviews without risk of accidental modifications.

Level 2: project-dev¶

Purpose: Human developers creating and managing ML pipelines manually. This policy grants the ability to build, iterate on, and execute both CI/CD delivery pipelines (CodePipeline/CodeBuild) and ML workflow pipelines (SageMaker) within a tenant-scoped boundary.

Typical Users:

Data scientists (manual pipeline runs and experimentation)
ML engineers (pipeline development and iteration)
Research teams (prototyping ML workflows)

Assignment: IAM Users only (not roles)

How It Differs from read-only:

➕ Adds CodePipeline write actions — create, update, start, stop, retry pipelines (read-only has view-only access)
➕ Adds CodeBuild write actions — start and stop builds (read-only can only view build results)
➕ Adds SageMaker Pipeline write actions — create, update, start, stop executions (read-only can only view pipeline state)
➕ Adds codepipeline:TagResource — organize pipeline resources with metadata tags
🔒 Tightens resource scoping — CodePipeline, CodeBuild, and SageMaker are scoped to {company_prefix}-{env}-{tenant_id}-* (read-only uses Resource: * for account-wide governance visibility)
➖ Removes sagemaker:DescribePipelineDefinitionForExecution and sagemaker:GetSearchSuggestions — these are governance/audit actions not needed for active development

What You Can Do (CI/CD pipelines):

✅ Create and update CodePipeline definitions for your project
✅ Start and stop pipeline executions manually
✅ Retry failed stages during development
✅ Start and monitor CodeBuild projects
✅ View build logs for debugging
✅ Tag pipeline resources for organization

What You Can Do (SageMaker pipelines):

✅ Create and update SageMaker Pipeline definitions
✅ Start and stop pipeline executions manually
✅ View execution history, step details, and pipeline configurations
✅ Iterate on pipeline design with different parameters

What You Cannot Do:

❌ No Deletion: codepipeline:DeletePipeline and sagemaker:DeletePipeline are excluded — pipelines are removed through admin-level access only, protecting execution history and audit trails.
❌ No Cross-Tenant Access: Resource scoping limits access to pipelines matching {company_prefix}-{env}-{tenant_id}-*, preventing access to other teams’ workflows.
❌ No Platform-Wide Settings: Cannot modify account-level CodePipeline or SageMaker configurations.

Example Scenario:

Tom is an ML engineer developing a new fraud detection pipeline. He creates a SageMaker Pipeline definition in Python, triggers training runs with different hyperparameters, and monitors execution steps — while the CodePipeline he set up automatically rebuilds the pipeline on each commit to his feature branch.

Sample Permissions:

[
  {
    "Sid": "CodePipelineDevAccess",
    "Effect": "Allow",
    "Action": [
      "codepipeline:CreatePipeline",
      "codepipeline:UpdatePipeline",
      "codepipeline:GetPipeline",
      "codepipeline:GetPipelineExecution",
      "codepipeline:GetPipelineState",
      "codepipeline:ListPipelines",
      "codepipeline:ListPipelineExecutions",
      "codepipeline:ListActionTypes",
      "codepipeline:ListTagsForResource",
      "codepipeline:StartPipelineExecution",
      "codepipeline:StopPipelineExecution",
      "codepipeline:RetryStageExecution",
      "codepipeline:TagResource"
    ],
    "Resource": "arn:aws:codepipeline:*:*:{company_prefix}-{env}-{tenant_id}-*"
  },
  {
    "Sid": "CodeBuildDevAccess",
    "Effect": "Allow",
    "Action": [
      "codebuild:StartBuild",
      "codebuild:StopBuild",
      "codebuild:BatchGetBuilds",
      "codebuild:ListBuilds"
    ],
    "Resource": "arn:aws:codebuild:*:*:project/{company_prefix}-{env}-{tenant_id}-*"
  },
  {
    "Sid": "PipelineLogsAccess",
    "Effect": "Allow",
    "Action": [
      "logs:GetLogEvents",
      "logs:DescribeLogStreams"
    ],
    "Resource": "*"
  },
  {
    "Sid": "SagemakerPipelineDevAccess",
    "Effect": "Allow",
    "Action": [
      "sagemaker:CreatePipeline",
      "sagemaker:UpdatePipeline",
      "sagemaker:DescribePipeline",
      "sagemaker:StartPipelineExecution",
      "sagemaker:StopPipelineExecution",
      "sagemaker:DescribePipelineExecution",
      "sagemaker:ListPipelineExecutions",
      "sagemaker:ListPipelineExecutionSteps"
    ],
    "Resource": "arn:aws:sagemaker:*:*:pipeline/{company_prefix}-{env}-{tenant_id}-*"
  }
]

"Sid": "CodePipelineDevAccess"

Grants development-level access to AWS CodePipeline, scoped to pipelines matching the tenant’s naming convention. This ensures developers can only manage their own team’s CI/CD pipelines.

Pipeline Management

Policy Action	Description
codepipeline:CreatePipeline	Create new CI/CD pipeline definitions for the project.
codepipeline:UpdatePipeline	Modify existing pipeline stages, actions, and configurations.
codepipeline:StartPipelineExecution	Manually trigger a pipeline run.
codepipeline:StopPipelineExecution	Cancel a running pipeline execution.
codepipeline:RetryStageExecution	Re-run a failed stage without restarting the entire pipeline.
codepipeline:TagResource	Add or update metadata tags on pipeline resources.

Read/Monitor Actions

Policy Action	Description
codepipeline:GetPipeline	View the detailed definition and structure of a pipeline.
codepipeline:GetPipelineExecution	View the status and details of a specific execution.
codepipeline:GetPipelineState	Monitor the real-time status of each stage and action.
codepipeline:ListPipelines	List all available pipelines in the account.
codepipeline:ListPipelineExecutions	View the history of all pipeline runs.
codepipeline:ListActionTypes	See what types of actions (Build, Deploy, Test) are available.
codepipeline:ListTagsForResource	Review metadata tags for cost tracking and governance.

"Sid": "CodeBuildDevAccess"

Grants development-level access to AWS CodeBuild projects, scoped to build projects matching the tenant’s naming convention.

Policy Action	Description
codebuild:StartBuild	Trigger a CodeBuild project to compile, test, or package code.
codebuild:StopBuild	Cancel a running build job.
codebuild:BatchGetBuilds	View details of specific build jobs triggered by the pipeline.
codebuild:ListBuilds	List all build jobs for visibility into build history.

"Sid": "PipelineLogsAccess"

Grants read access to CloudWatch Logs for debugging pipeline and build execution. This Sid uses Resource: * because log group names are generated by CodePipeline and CodeBuild at runtime and do not follow a predictable tenant-scoped naming pattern.

Policy Action	Description
logs:GetLogEvents	Access execution logs for debugging and troubleshooting.
logs:DescribeLogStreams	List available log streams to locate relevant log output.

"Sid": "SagemakerPipelineDevAccess"

Grants development-level access to SageMaker ML pipelines, scoped to the tenant’s project namespace.

Pipeline Management

Policy Action	Description
sagemaker:CreatePipeline	Define a new SageMaker Pipeline for ML workflows (training, processing, evaluation).
sagemaker:UpdatePipeline	Modify an existing pipeline definition to iterate on the workflow design.
sagemaker:StartPipelineExecution	Trigger a pipeline run with specified parameters (e.g., hyperparameters, data paths).
sagemaker:StopPipelineExecution	Cancel a running execution to stop compute costs or abort a misconfigured run.

Read/Monitor Actions

Policy Action	Description
sagemaker:DescribePipeline	View the metadata and configuration of a pipeline definition.
sagemaker:DescribePipelineExecution	View the status, parameters, and details of a specific execution run.
sagemaker:ListPipelineExecutions	View the history of all runs for a pipeline to track iteration progress.
sagemaker:ListPipelineExecutionSteps	Examine individual steps within an execution to identify which step failed or succeeded.

Resource Scope: All four Sids are tenant-scoped to {company_prefix}-{env}-{tenant_id}-*, ensuring developers can only access their own team’s resources. The only exception is PipelineLogsAccess which uses Resource: * because CloudWatch log group names are generated by AWS services at runtime and do not follow a predictable tenant-scoped naming pattern. This is a known AWS limitation — log access can be further restricted via log group resource policies as the platform matures.

Level 3: project-ci¶

Purpose: This IAM policy provides a self-contained set of permissions for CI/CD runners (GitHub Actions, Jenkins, GitLab CI, AWS CodePipeline) to automate end-to-end ML workflows. It includes SageMaker Pipeline orchestration, container registry access, pipeline asset retrieval, configuration/secrets access, and the ability to pass execution roles to SageMaker — everything a CI/CD runner needs to function without requiring additional policy assignments.

Unlike project-dev (which targets human developers working across both CodePipeline and SageMaker), project-ci is focused on automated SageMaker Pipeline workflows with the supporting infrastructure permissions that runners need to operate independently.

Typical Users:

GitHub Actions workflows
Jenkins build jobs
GitLab CI runners
AWS CodePipeline stages

Assignment: IAM Roles only (not users)

How It Differs from project-dev:

➖ Removes CodePipeline/CodeBuild management actions — CI/CD runners interact with SageMaker Pipelines directly, not through CodePipeline console
➖ Removes CloudWatch logs read actions — runners capture logs through their own logging mechanisms (GitHub Actions logs, Jenkins console output)
➕ Adds S3 read access for pipeline assets — runners need to download pipeline definitions, code, and model artifacts
➕ Adds Secrets Manager and SSM Parameter Store read access — runners need configuration and secrets for pipeline execution
➕ Adds ECR push/pull access — runners build and push container images used in ML pipeline steps
➕ Adds iam:PassRole — runners must pass execution roles to SageMaker for training and processing jobs
➕ Adds sagemaker:ListPipelines — runners need to discover existing pipelines to decide whether to create or update

What You Can Do:

✅ Create and update SageMaker Pipeline definitions from code
✅ Discover existing pipelines to determine create vs update
✅ Trigger pipeline executions automatically on code merge
✅ Stop pipelines on failure conditions
✅ Monitor execution progress and report step-level status
✅ Download pipeline definitions and artifacts from S3
✅ Retrieve configuration and secrets for pipeline execution
✅ Authenticate to ECR and push/pull container images
✅ Pass execution roles to SageMaker for training and processing jobs

What You Cannot Do:

❌ Delete pipelines
❌ Access other teams’ pipelines
❌ Modify platform-wide settings
❌ Delete container images or repositories
❌ Modify ECR repository policies or lifecycle rules
❌ Write to S3 (read-only access for pipeline assets)
❌ Create or modify secrets/parameters (read-only access)
❌ Pass roles to services other than SageMaker

Restriction	Description
No Pipeline Deletion	`sagemaker:DeletePipeline` is excluded — pipeline removal is an admin-level action. CI/CD should create and update, not destroy. If a pipeline needs removal, that’s a human decision through project-dev or full access.
No Cross-Tenant Access	SageMaker actions are scoped to `{company_prefix}-{env}-{tenant_id}-*`, preventing access to other teams’ pipelines.
No Container Deletion	ECR actions do not include `ecr:BatchDeleteImage` or `ecr:DeleteRepository` — image cleanup should be handled by ECR Lifecycle Policies, not CI/CD runners.
No S3 Write Access	Runners can read pipeline assets but cannot modify or delete them — pipeline definitions are managed through version control, not runner writes.
No Secrets Modification	Runners can read secrets and parameters but cannot create, update, or delete them — secrets management is an admin responsibility.
No Unrestricted Role Passing	`iam:PassRole` is conditioned to `sagemaker.amazonaws.com` only — runners cannot pass roles to other services, limiting blast radius.

Example Scenario:

A GitHub Actions workflow triggers on merge to main. It downloads the pipeline definition from S3, pulls the base training image from ECR, builds a new image with updated code, pushes it to ECR, creates/updates the SageMaker Pipeline definition, passes the SageMaker execution role, starts a training run, and monitors step-level execution status — reporting results back to the GitHub PR.

Sample Permissions:

[
  {
    "Sid": "SageMakerPipelineManagement",
    "Effect": "Allow",
    "Action": [
      "sagemaker:CreatePipeline",
      "sagemaker:UpdatePipeline",
      "sagemaker:DescribePipeline",
      "sagemaker:ListPipelines",
      "sagemaker:StartPipelineExecution",
      "sagemaker:StopPipelineExecution",
      "sagemaker:DescribePipelineExecution",
      "sagemaker:ListPipelineExecutions",
      "sagemaker:ListPipelineExecutionSteps"
    ],
    "Resource": "arn:aws:sagemaker:*:*:pipeline/{company_prefix}-{env}-{tenant_id}-*"
  },
  {
    "Sid": "PipelineAssetAccess",
    "Effect": "Allow",
    "Action": [
      "s3:GetObject",
      "s3:ListBucket"
    ],
    "Resource": [
      "arn:aws:s3:::{company_prefix}-{env}-{tenant_id}-*",
      "arn:aws:s3:::{company_prefix}-{env}-{tenant_id}-*/*"
    ]
  },
  {
    "Sid": "ConfigurationAndSecretsAccess",
    "Effect": "Allow",
    "Action": [
      "secretsmanager:GetSecretValue",
      "ssm:GetParameter",
      "ssm:GetParameters"
    ],
    "Resource": [
      "arn:aws:secretsmanager:*:*:secret:{company_prefix}-{env}-{tenant_id}-*",
      "arn:aws:ssm:*:*:parameter/{company_prefix}-{env}-{tenant_id}/*"
    ]
  },
  {
    "Sid": "ContainerRegistryAccess",
    "Effect": "Allow",
    "Action": [
      "ecr:GetAuthorizationToken",
      "ecr:BatchCheckLayerAvailability",
      "ecr:GetDownloadUrlForLayer",
      "ecr:BatchGetImage",
      "ecr:DescribeRepositories",
      "ecr:ListImages",
      "ecr:DescribeImages",
      "ecr:InitiateLayerUpload",
      "ecr:UploadLayerPart",
      "ecr:CompleteLayerUpload",
      "ecr:PutImage",
      "ecr:TagResource"
    ],
    "Resource": "*"
  },
  {
    "Sid": "PassRoleToSageMaker",
    "Effect": "Allow",
    "Action": "iam:PassRole",
    "Resource": "arn:aws:iam::*:role/{company_prefix}-{env}-{tenant_id}-role-*",
    "Condition": {
      "StringEquals": {
        "iam:PassedToService": "sagemaker.amazonaws.com"
      }
    }
  }
]

"Sid": "SageMakerPipelineManagement"

Grants the CI/CD runner full lifecycle management of SageMaker Pipelines (except deletion), scoped to the tenant’s project namespace.

Pipeline Definition Management

Policy Action	Description
sagemaker:CreatePipeline	Create a new SageMaker Pipeline definition from code when a pipeline doesn’t yet exist.
sagemaker:UpdatePipeline	Update an existing pipeline definition when code is merged — the primary action for iterative CI/CD deployments.
sagemaker:DescribePipeline	Retrieve metadata and configuration of a pipeline definition. Required for the runner to verify the current state before applying updates.
sagemaker:ListPipelines	Discover existing pipelines in the tenant namespace. Required for the runner to determine whether to create a new pipeline or update an existing one.

Pipeline Execution Management

Policy Action	Description
sagemaker:StartPipelineExecution	Trigger a pipeline run automatically after a successful build or code merge.
sagemaker:StopPipelineExecution	Halt a running execution if automated tests or failure conditions are detected.
sagemaker:DescribePipelineExecution	View the status, parameters, and details of a specific execution run. Required for the runner to report pass/fail status back to GitHub/Jenkins.
sagemaker:ListPipelineExecutions	View the history of all runs for a pipeline. Required for the runner to check if a previous execution is still running before starting a new one.
sagemaker:ListPipelineExecutionSteps	Examine individual steps within an execution. Required for the runner to report step-level status (e.g., “training step failed at epoch 5”) back to the CI/CD system.

"Sid": "PipelineAssetAccess"

Grants read-only access to S3 buckets within the tenant namespace for downloading pipeline definitions, code artifacts, and model artifacts.

Policy Action	Description
s3:GetObject	Download pipeline definition files, code packages, and model artifacts stored in S3.
s3:ListBucket	List objects within the tenant’s S3 buckets to verify that required assets exist before pipeline execution.

"Sid": "ConfigurationAndSecretsAccess"

Grants read-only access to configuration and secrets required for pipeline execution, scoped to the tenant namespace.

Policy Action	Description
secretsmanager:GetSecretValue	Retrieve sensitive data (API keys, database credentials, external service tokens) needed during pipeline execution.
ssm:GetParameter	Read a single configuration parameter (e.g., model hyperparameters, feature store endpoints).
ssm:GetParameters	Read multiple configuration parameters in a single call for efficient pipeline initialization.

"Sid": "ContainerRegistryAccess"

Enables the CI/CD runner to authenticate with ECR, pull base images, build new images, and push them to the registry for use in ML pipeline steps.

Authentication

Policy Action	Description
ecr:GetAuthorizationToken	Retrieve a temporary authentication token to authenticate the Docker CLI to the registry.

Read/Pull Actions

Policy Action	Description
ecr:BatchCheckLayerAvailability	Check if specific image layers already exist in the repository (used during both pull and push).
ecr:GetDownloadUrlForLayer	Retrieve a URL to download a specific image layer for pulling base images.
ecr:BatchGetImage	Retrieve image manifests for pulling base images used in pipeline steps.
ecr:DescribeRepositories	View repository metadata to verify target repositories exist before pushing.
ecr:ListImages	List images in a repository to check for existing tags and avoid unnecessary rebuilds.
ecr:DescribeImages	View image metadata (size, push date, tags) for build cache optimization.

Write/Push Actions

Policy Action	Description
ecr:InitiateLayerUpload	Start the multi-step process of uploading a new image layer.
ecr:UploadLayerPart	Upload a segment of an image layer during the push process.
ecr:CompleteLayerUpload	Finalize the layer upload, confirming all parts have been received.
ecr:PutImage	Push the image manifest to the repository, making the complete container image available for use in pipeline steps.
ecr:TagResource	Add or update metadata tags on repositories (e.g., build number, commit SHA).

"Sid": "PassRoleToSageMaker"

Permits the CI/CD runner to pass an execution role to SageMaker so that pipeline steps (training jobs, processing jobs, transform jobs) have the compute permissions they need.

Policy Action	Description
iam:PassRole	Assign a specific service role to the SageMaker Pipeline being created or executed. Conditioned to `sagemaker.amazonaws.com` only — the runner cannot pass roles to any other AWS service, limiting the blast radius of a compromised runner. Scoped to roles matching `{company_prefix}-{env}-{tenant_id}-role-*` to prevent passing arbitrary roles.

GitHub Actions Example:

- name: Configure AWS Credentials
  uses: aws-actions/configure-aws-credentials@v4
  with:
    role-to-assume: arn:aws:iam::123456789012:role/acme-dev-a001-role-ci-runner
    aws-region: us-west-2

- name: Login to ECR
  run: |
    aws ecr get-login-password --region us-west-2 | \
      docker login --username AWS --password-stdin \
      123456789012.dkr.ecr.us-west-2.amazonaws.com

- name: Build and Push Training Image
  run: |
    docker build -t acme-dev-a001-ml-training:${{ github.sha }} .
    docker tag acme-dev-a001-ml-training:${{ github.sha }} \
      123456789012.dkr.ecr.us-west-2.amazonaws.com/acme-dev-a001-ml-training:${{ github.sha }}
    docker push \
      123456789012.dkr.ecr.us-west-2.amazonaws.com/acme-dev-a001-ml-training:${{ github.sha }}

- name: Deploy SageMaker Pipeline
  run: |
    python deploy_pipeline.py \
      --pipeline-name acme-dev-a001-fraud-detection \
      --role-arn ${{ secrets.SAGEMAKER_ROLE_ARN }} \
      --image-uri 123456789012.dkr.ecr.us-west-2.amazonaws.com/acme-dev-a001-ml-training:${{ github.sha }}

- name: Start Pipeline Execution
  run: |
    aws sagemaker start-pipeline-execution \
      --pipeline-name acme-dev-a001-fraud-detection \
      --pipeline-parameters '[{"Name":"ImageUri","Value":"'$IMAGE_URI'"}]'

Compliance Note: In regulated industries, auditors can trace:

Human actions → IAM user CloudTrail logs (project-dev)
Automated actions → IAM role CloudTrail logs (project-ci)

This separation satisfies SOC2, HIPAA, and PCI-DSS requirements.

Level 4: project-full¶

⚠️ Reference Pattern — Not Generated by sec-provisioner

This policy requires a specific project name in the resource ARN (e.g., fraud-detection, recommendation-engine). Since project names are not known at platform provisioning time, this policy is not generated by the sec-provisioner. It is documented here as a reference pattern to be applied during project onboarding when the project name is known.

Purpose: Full pipeline control for human team members working on a specific ML project

Principal: Human (project engineers and data scientists)

Typical Users:

ML engineers (project-focused)
Data scientists (running experiments)
Project teams (isolated access)

Assignment: Attached to project-specific IAM groups created during project onboarding

How It Differs from project-ci (Level 3):

Same resource scope — both are scoped to a single project’s pipelines
Different principal — project-ci is for automated CI/CD runners, project-full is for humans
Removes runner-specific actions — no iam:PassRole, no SSM/Secrets access, no S3/ECR asset access
Adds interactive debugging — ListPipelineExecutionSteps for step-level troubleshooting
Adds discovery — ListPipelines for humans to browse their project’s pipelines
Adds explicit Deny — DenyCriticalActions Sid blocks sagemaker:DeletePipeline as a safety net (runners don’t need this because they never have delete in their Allow)

What You Can Do:

✅ Create and update pipelines for your project
✅ Start and stop pipeline executions
✅ View execution logs, metrics, and step-level details
✅ List and discover your project’s pipelines

What You Cannot Do:

❌ Access other teams’ pipelines
❌ Modify shared/platform pipelines
❌ Delete pipelines (explicit Deny)
❌ Pass IAM roles or access secrets (those are runner concerns)

Example Scenario:

The fraud-detection team needs to run their training pipeline without accessing the recommendation-engine team’s pipelines. Engineers create, update, and monitor pipelines interactively — but deletion requires a platform admin.

Resource Scope:

All SageMaker resources are scoped to the project name within the tenant’s naming convention. Replace {project} with the actual project name at onboarding time.

arn:aws:sagemaker:*:*:pipeline/{company_prefix}-{env}-{project}-*

Example (for Edge Corp, prod environment, fraud-detection project):

arn:aws:sagemaker:*:*:pipeline/edge-prod-fraud-detection-*

Sample Permissions:

[
  {
    "Sid": "PipelineManagement",
    "Effect": "Allow",
    "Action": [
      "sagemaker:CreatePipeline",
      "sagemaker:UpdatePipeline"
    ],
    "Resource": "arn:aws:sagemaker:*:*:pipeline/edge-prod-fraud-detection-*"
  },
  {
    "Sid": "PipelineExecution",
    "Effect": "Allow",
    "Action": [
      "sagemaker:StartPipelineExecution",
      "sagemaker:StopPipelineExecution"
    ],
    "Resource": "arn:aws:sagemaker:*:*:pipeline/edge-prod-fraud-detection-*"
  },
  {
    "Sid": "MonitoringAndVisibility",
    "Effect": "Allow",
    "Action": [
      "sagemaker:DescribePipeline",
      "sagemaker:DescribePipelineExecution",
      "sagemaker:ListPipelines",
      "sagemaker:ListPipelineExecutions",
      "sagemaker:ListPipelineExecutionSteps"
    ],
    "Resource": "arn:aws:sagemaker:*:*:pipeline/edge-prod-fraud-detection-*"
  },
  {
    "Sid": "PipelineLogsAccess",
    "Effect": "Allow",
    "Action": [
      "logs:GetLogEvents",
      "logs:DescribeLogStreams"
    ],
    "Resource": "*"
  },
  {
    "Sid": "DenyCriticalActions",
    "Effect": "Deny",
    "Action": [
      "sagemaker:DeletePipeline"
    ],
    "Resource": "*"
  }
]

"Sid": "PipelineManagement"

Allows users to build and modify project-specific ML workflows. Restricted via the Resource ARN to only pipelines matching the project’s naming convention, preventing interference with other teams.

Policy Action	Description
sagemaker:CreatePipeline	Define a new sequence of ML steps (data prep, training, etc.) for this project.
sagemaker:UpdatePipeline	Modify existing pipeline definitions as project requirements evolve.

"Sid": "PipelineExecution"

Grants operational control to run or halt project experiments. Ensures data scientists can iterate on models without requiring admin intervention.

Policy Action	Description
sagemaker:StartPipelineExecution	Triggers a new run of the ML pipeline using specified data or parameters.
sagemaker:StopPipelineExecution	Allows engineers to manually kill a run if errors are detected, saving compute costs.

"Sid": "MonitoringAndVisibility"

Provides read access for interactive debugging and troubleshooting. Unlike project-ci (which only needs execution-level status), humans need step-level detail and pipeline discovery to work effectively.

Policy Action	Description
sagemaker:DescribePipeline	Retrieves pipeline metadata including ARN, name, creation time, status, and associated IAM identity.
sagemaker:DescribePipelineExecution	Returns details about a specific execution such as ARN, status, creation time, and failure reasons.
sagemaker:ListPipelines	Discover all pipelines within the project scope. Humans need this to browse and select pipelines interactively.
sagemaker:ListPipelineExecutions	View the history of all runs for a pipeline. Lists execution summaries for troubleshooting and tracking.
sagemaker:ListPipelineExecutionSteps	Inspect individual steps within an execution. Essential for humans debugging which step failed and why.

"Sid": "PipelineLogsAccess"

Separated from MonitoringAndVisibility because CloudWatch log group names are generated by AWS at runtime and cannot be scoped to a project prefix. Uses Resource: * out of necessity, not by choice.

Policy Action	Description
logs:GetLogEvents	Retrieves log events from a CloudWatch Logs log stream, allowing filtering by time range.
logs:DescribeLogStreams	Lists log streams within a log group, with options to filter by prefix or order by last event time.

"Sid": "DenyCriticalActions"

Explicit safety net to prevent accidental or unauthorized deletion. An explicit Deny always overrides an Allow in IAM, ensuring that no other policy — including any future policy changes — can grant deletion rights to this group.

Policy Action	Description
sagemaker:DeletePipeline	Specifically blocked to ensure that even project members cannot permanently remove pipeline infrastructure. Deletion is reserved for Level 5: platform-full.

Level 5: platform-full¶

Purpose: Platform-wide pipeline management across all projects and tenants

Principal: Human (platform administrators)

Typical Users:

MLOps platform team
Pipeline infrastructure owners
Cross-project coordinators

Assignment: Platform admin IAM groups (e.g., {company_prefix}-{env}-group-platform-admins)

How It Differs from project-full (Level 4):

Scope breaks out from project to account-wide — Resource: * instead of project-scoped ARNs
Adds delete — sagemaker:DeletePipeline and codepipeline:DeletePipeline (only level that can delete)
Adds CodePipeline management — Levels 1-4 focus on SageMaker Pipelines; Level 5 adds full CI/CD pipeline control
Adds governance actions — approval gates, stage transitions, pipeline freezing
Adds PassRole — can assign IAM roles to pipelines (scoped to SageMaker and CodePipeline services)
No explicit Deny — this is the level where delete is intentionally allowed

What You Can Do:

✅ Manage all SageMaker pipelines across all projects and tenants
✅ Manage all CodePipeline CI/CD pipelines across the platform
✅ Create shared/platform pipelines
✅ Delete obsolete pipelines (SageMaker and CodePipeline)
✅ Approve/reject deployment gates
✅ Freeze and unfreeze pipeline stages
✅ Assign IAM roles to pipelines

What You Cannot Do:

❌ Nothing — full pipeline access across both SageMaker and CodePipeline

Example Scenario:

The MLOps team maintains a shared data preprocessing pipeline used by all ML projects and needs to update it with new validation steps. They also need to decommission a retired project’s pipelines and approve a production deployment gate.

Resource Scope:

Account-wide — no tenant or project scoping. Platform admins need cross-cutting access to manage the entire pipeline infrastructure.

Resource: "*"

Sample Permissions:

[
  {
    "Sid": "SageMakerPipelineFullAccess",
    "Effect": "Allow",
    "Action": [
      "sagemaker:CreatePipeline",
      "sagemaker:UpdatePipeline",
      "sagemaker:DeletePipeline",
      "sagemaker:DescribePipeline",
      "sagemaker:ListPipelines",
      "sagemaker:StartPipelineExecution",
      "sagemaker:StopPipelineExecution",
      "sagemaker:DescribePipelineExecution",
      "sagemaker:ListPipelineExecutions",
      "sagemaker:ListPipelineExecutionSteps"
    ],
    "Resource": "*"
  },
  {
    "Sid": "CodePipelineFullAccess",
    "Effect": "Allow",
    "Action": [
      "codepipeline:CreatePipeline",
      "codepipeline:UpdatePipeline",
      "codepipeline:DeletePipeline",
      "codepipeline:GetPipeline",
      "codepipeline:ListPipelines",
      "codepipeline:GetPipelineState",
      "codepipeline:GetPipelineExecution",
      "codepipeline:StartPipelineExecution",
      "codepipeline:StopPipelineExecution",
      "codepipeline:RetryStageExecution",
      "codepipeline:RollbackStage"
    ],
    "Resource": "*"
  },
  {
    "Sid": "PipelineGovernance",
    "Effect": "Allow",
    "Action": [
      "codepipeline:PutApprovalResult",
      "codepipeline:DisableStageTransition",
      "codepipeline:EnableStageTransition"
    ],
    "Resource": "*"
  },
  {
    "Sid": "PipelineLogsFullAccess",
    "Effect": "Allow",
    "Action": [
      "logs:GetLogEvents",
      "logs:DescribeLogStreams",
      "logs:DescribeLogGroups",
      "logs:FilterLogEvents"
    ],
    "Resource": "*"
  },
  {
    "Sid": "PassRoleToPipelineServices",
    "Effect": "Allow",
    "Action": [
      "iam:PassRole"
    ],
    "Resource": "arn:aws:iam::*:role/{company_prefix}-{env}-*-role-*",
    "Condition": {
      "StringEquals": {
        "iam:PassedToService": [
          "sagemaker.amazonaws.com",
          "codepipeline.amazonaws.com"
        ]
      }
    }
  }
]

"Sid": "SageMakerPipelineFullAccess"

Full control over all SageMaker Pipelines across every project and tenant. This is the only level that includes sagemaker:DeletePipeline — all lower levels either omit it or explicitly deny it. Platform admins use this to manage the complete lifecycle of ML pipelines including decommissioning retired projects.

Policy Action	Description
sagemaker:CreatePipeline	Create new SageMaker pipeline definitions for any project or shared infrastructure.
sagemaker:UpdatePipeline	Modify any existing pipeline definition across the platform.
sagemaker:DeletePipeline	Permanently remove obsolete or retired pipelines. Only available at this level.
sagemaker:DescribePipeline	Retrieve metadata for any pipeline including ARN, status, and associated IAM identity.
sagemaker:ListPipelines	Discover all pipelines across the entire account for cross-project visibility.
sagemaker:StartPipelineExecution	Trigger execution of any pipeline for cross-project coordination or incident response.
sagemaker:StopPipelineExecution	Halt any running pipeline execution across the platform.
sagemaker:DescribePipelineExecution	Inspect execution details including status, timing, and failure reasons for any pipeline.
sagemaker:ListPipelineExecutions	View execution history across all pipelines for platform-wide monitoring.
sagemaker:ListPipelineExecutionSteps	Inspect step-level details within any execution for deep troubleshooting.

"Sid": "CodePipelineFullAccess"

Full control over all CodePipeline CI/CD pipelines. This extends platform-full beyond SageMaker into the CI/CD layer, giving the MLOps team end-to-end pipeline management from source code through to model deployment.

Policy Action	Description
codepipeline:CreatePipeline	Create new CI/CD pipelines for any project or shared infrastructure.
codepipeline:UpdatePipeline	Modify the structure or settings of any existing pipeline.
codepipeline:DeletePipeline	Permanently remove obsolete CI/CD pipeline configurations.
codepipeline:GetPipeline	View the JSON structure and configuration of any pipeline.
codepipeline:ListPipelines	List all CI/CD pipelines in the account for platform-wide visibility.
codepipeline:GetPipelineState	Real-time view of stage and action status (Succeeded, In Progress, Failed).
codepipeline:GetPipelineExecution	View details and history of a specific execution run.
codepipeline:StartPipelineExecution	Manually trigger any pipeline for cross-project coordination.
codepipeline:StopPipelineExecution	Force-stop a running pipeline mid-process.
codepipeline:RetryStageExecution	Restart a failed stage without rerunning the entire pipeline.
codepipeline:RollbackStage	Revert a stage to a previous successful state for incident recovery.

"Sid": "PipelineGovernance"

Governance actions for deployment control. Allows platform admins to approve or reject deployment gates, freeze pipeline stages during incidents, and resume flow when resolved. Separated from CodePipelineFullAccess because these are administrative/governance actions, not pipeline CRUD.

Policy Action	Description
codepipeline:PutApprovalResult	Approve or reject a manual approval gate to move a deployment forward.
codepipeline:DisableStageTransition	Freeze a pipeline stage to prevent progression (e.g., during an incident or change freeze).
codepipeline:EnableStageTransition	Re-enable flow between stages after a freeze is lifted.

"Sid": "PipelineLogsFullAccess"

Full CloudWatch Logs access for platform-wide pipeline troubleshooting. Adds DescribeLogGroups and FilterLogEvents beyond what lower levels have — platform admins need to discover log groups across all projects and search across log streams.

Policy Action	Description
logs:GetLogEvents	Retrieve log events from any pipeline’s log stream.
logs:DescribeLogStreams	List log streams within any log group for cross-project investigation.
logs:DescribeLogGroups	Discover all log groups across the account — needed for platform-wide visibility.
logs:FilterLogEvents	Search across log streams within a log group — essential for incident investigation across projects.

"Sid": "PassRoleToPipelineServices"

Allows platform admins to assign IAM roles to both SageMaker and CodePipeline services. Scoped to roles matching the platform’s naming convention and conditioned to only pass roles to pipeline services — prevents using this permission to escalate privileges to other AWS services.

Policy Action	Description
iam:PassRole	Assign IAM service roles to pipelines. Scoped to `{company_prefix}-{env}--role-` and conditioned to `sagemaker.amazonaws.com` and `codepipeline.amazonaws.com` only.

Inference Policies¶

Inference policies control access to deployed ML models and prediction services. Unlike S3, ECR, or Pipeline policies which each target a single AWS service, inference spans multiple services — each with its own permission model and use cases.

Service	Use Case	Levels
SageMaker Inference	Real-time endpoints, batch transform, async/serverless inference, autoscaling	4
Lambda Inference	Lightweight model serving, custom inference containers, event-driven predictions	3
Bedrock Inference	Foundation model invocation, cross-region inference, provisioned throughput	3

Each service gets the number of levels its permission model actually needs — no artificial uniformity.

SageMaker Inference¶

SageMaker Inference policies control access to deployed ML models and endpoints. The level progression is invoke-centric: who can call the model, and in which environment.

Level 1: read-only¶

Purpose: This IAM policy provides read-only access for monitoring the health of SageMaker endpoints without granting permissions to invoke predictions (no inference costs)

Principal: Human (auditors, monitoring teams)

Typical Users:

Compliance auditors
QA teams
Monitoring dashboards
Cost optimization analysts
New team members learning the platform

What You Can Do:

✅ View endpoint status and health
✅ List all endpoints and configurations
✅ See endpoint metadata (instance type, model version)
✅ Monitor endpoint metrics (latency, error rates)
✅ Check autoscaling settings

What You Cannot Do:

❌ Invoke endpoints (send prediction requests)
❌ Create or modify endpoints
❌ Delete endpoints

Example Scenario:

Sarah is a QA engineer who needs to verify that all production endpoints are using the approved instance types and have autoscaling enabled. She needs to see endpoint configurations but doesn’t need to send prediction requests.

Sample Permissions:

[
    {
        "Sid": "SageMakerEndpointReadOnly",
        "Effect": "Allow",
        "Action": [
            "sagemaker:ListEndpoints",
            "sagemaker:DescribeEndpoint",
            "sagemaker:ListEndpointConfigs",
            "sagemaker:DescribeEndpointConfig",
            "sagemaker:ListModels",
            "sagemaker:DescribeModel",
            "sagemaker:DescribeModelPackage",
            "sagemaker:ListModelPackages"
        ],
        "Resource": "*"
    },
    {
        "Sid": "CloudWatchMetricsReadOnly",
        "Effect": "Allow",
        "Action": [
            "cloudwatch:GetMetricData",
            "cloudwatch:GetMetricStatistics",
            "cloudwatch:ListMetrics"
        ],
        "Resource": "*"
    },
    {
        "Sid": "AutoScalingReadOnly",
        "Effect": "Allow",
        "Action": [
            "application-autoscaling:DescribeScalableTargets",
            "application-autoscaling:DescribeScalingPolicies"
        ],
        "Resource": "*"
    },
      {
        "Sid": "ExplicitDenyInference",
        "Effect": "Deny",
        "Action": [
            "sagemaker:InvokeEndpoint",
            "sagemaker:InvokeEndpointAsync"
        ],
        "Resource": "*"
    }      
]

"Sid": "SageMakerEndpointReadOnly"

Grants view-only access to SageMaker hosting resources, allowing the user to see lists and configurations for models and endpoints.

Policy Action	Description
sagemaker:ListEndpoints	Returns a list of all existing endpoints in your account, including their names and ARNs.
sagemaker:DescribeEndpoint	Returns detailed information about a specific endpoint (e.g., status, configuration name).
sagemaker:ListEndpointConfigs	Lists all endpoint configurations (the blueprints for endpoints).
sagemaker:DescribeEndpointConfig	Returns details of an endpoint configuration, such as instance types and model names.
sagemaker:ListModels	Lists all models currently created in SageMaker.
sagemaker:DescribeModel	Returns details about a model, including the container image and execution role.
sagemaker:DescribeModelPackage	Provides information about a specific versioned model package.
sagemaker:ListModelPackages	Lists all available model packages or groups.

"Sid": "CloudWatchMetricsReadOnly"

Provides access to view performance data and statistics for monitoring the health and usage of the resources.

Policy Action	Description
cloudwatch:GetMetricData	Retrieves raw data points for various metrics across multiple resources.
cloudwatch:GetMetricStatistics	Gets specific statistical data (Average, Sum, Max, etc.) for a metric.
cloudwatch:ListMetrics	Lists all the valid metric names available to be viewed.

"Sid": "AutoScalingReadOnly"

Allows the user to see the scaling configurations and policies applied to the endpoints.

Policy Action	Description
application-autoscaling:DescribeScalableTargets	Shows which resources (endpoints) are set up to scale automatically.
application-autoscaling:DescribeScalingPolicies	Shows the specific rules that trigger a scale-up or scale-down event.

"Sid": "ExplicitDenyInference"

Specifically blocks the ability to actually run data through an endpoint for predictions, ensuring the policy remains “read-only.”

Policy Action	Description
sagemaker:InvokeEndpoint	(Denied) The action required to send a synchronous request to an endpoint for a prediction.
sagemaker:InvokeEndpointAsync	(Denied) The action required to send an asynchronous request for long-running inferences.

Cost Benefit: No InvokeEndpoint permission means no inference charges - perfect for monitoring and audit use cases.

Level 1-prod: read-only-invoke¶

Purpose: Read-only monitoring access with production invoke permissions. Designed for non-technical users who consume model predictions through dashboards and applications.

Principal: Human (business consumers, product managers, analysts)

Typical Users:

Business consumers using ML-powered dashboards
Product managers validating model outputs
Analysts running predictions for business decisions
Applications calling endpoints on behalf of business users

How It Differs from read-only (Level 1):

Adds invoke — can send prediction requests to endpoints in all environments including production
Same read-only baseline — identical list/describe/monitor permissions
No endpoint lifecycle — cannot create, modify, or delete endpoints

What You Can Do:

✅ Everything in read-only (Level 1), PLUS:
✅ Invoke sandbox, dev, staging, and production endpoints
✅ Send real-time and async prediction requests
✅ Consume ML models through applications and dashboards

What You Cannot Do:

❌ Create, modify, or delete endpoints or endpoint configs
❌ Register or manage models
❌ Access training jobs or notebooks

Example Scenario:

Lisa is a product manager who uses an internal dashboard powered by a fraud detection model. The dashboard calls the production SageMaker endpoint to score transactions in real-time. Lisa needs invoke access to production but should never modify the endpoint or model behind it.

Sample Permissions:

[
    {
        "Sid": "SageMakerEndpointReadOnly",
        "Effect": "Allow",
        "Action": [
            "sagemaker:ListEndpoints",
            "sagemaker:DescribeEndpoint",
            "sagemaker:ListEndpointConfigs",
            "sagemaker:DescribeEndpointConfig",
            "sagemaker:ListModels",
            "sagemaker:DescribeModel",
            "sagemaker:DescribeModelPackage",
            "sagemaker:ListModelPackages"
        ],
        "Resource": "*"
    },
    {
        "Sid": "CloudWatchMetricsReadOnly",
        "Effect": "Allow",
        "Action": [
            "cloudwatch:GetMetricData",
            "cloudwatch:GetMetricStatistics",
            "cloudwatch:ListMetrics"
        ],
        "Resource": "*"
    },
    {
        "Sid": "AutoScalingReadOnly",
        "Effect": "Allow",
        "Action": [
            "application-autoscaling:DescribeScalableTargets",
            "application-autoscaling:DescribeScalingPolicies"
        ],
        "Resource": "*"
    },
    {
        "Sid": "InvokeAllEnvironments",
        "Effect": "Allow",
        "Action": [
            "sagemaker:InvokeEndpoint",
            "sagemaker:InvokeEndpointAsync"
        ],
        "Resource": "arn:aws:sagemaker:*:*:endpoint/*"
    }
]

"Sid": "SageMakerEndpointReadOnly"

Identical to Level 1. Grants view-only access to SageMaker hosting resources.

"Sid": "CloudWatchMetricsReadOnly"

Identical to Level 1. Provides access to view performance data and statistics.

"Sid": "AutoScalingReadOnly"

Identical to Level 1. Allows the user to see scaling configurations.

"Sid": "InvokeAllEnvironments"

Allows sending prediction requests to endpoints in all environments (sandbox, dev, staging, production). Scoped to endpoint/* — no access to endpoint configs, models, or training resources.

Policy Action	Description
sagemaker:InvokeEndpoint	Sends a real-time inference request to any running endpoint to get a prediction.
sagemaker:InvokeEndpointAsync	Sends an inference request to any asynchronous endpoint (used for large payloads or long processing times).

Key Difference from Level 1: No ExplicitDenyInference Sid. Instead, invoke is explicitly allowed across all environments. The read-only baseline remains identical.

Cost Consideration: Unlike Level 1, this level incurs inference costs per invocation. Use API throttling and service quotas to manage cost risk rather than IAM restrictions.

Level 2: dev-invoke¶

Purpose: Test models in development and staging environments

Principal: Human (data scientists, ML engineers)

Typical Users:

Data scientists (A/B testing)
ML engineers (model validation)
QA teams (integration testing)
Development applications

How It Differs from read-only (Level 1):

Adds invoke — can send prediction requests to dev/staging endpoints
Adds model registration — can register trained models in SageMaker Model Registry
Environment-scoped — restricted to {prefix}-dev-* and {prefix}-staging-* endpoints
Sandbox/dev/staging endpoint lifecycle — can create, modify, and delete endpoints in non-production environments
Production endpoints blocked — explicit deny on *-prod-* endpoints and endpoint configs

What You Can Do:

✅ Everything in read-only, PLUS:
✅ Invoke dev endpoints for testing
✅ Invoke staging endpoints for validation
✅ Send test prediction requests
✅ Validate model responses
✅ Register trained models in SageMaker Model Registry
✅ Create model package groups for organizing model versions
✅ Create model definitions (link artifacts + container)
✅ Create, update, and delete sandbox/dev/staging endpoints
✅ Create endpoint configs for non-production environments

What You Cannot Do:

❌ Invoke production endpoints
❌ Create, modify, or delete production (*-prod-*) endpoints
❌ Create production endpoint configs
❌ Approve or reject model packages (governance responsibility)

Example Scenario:

Marcus is a data scientist who deployed two fraud detection models to the staging environment. He needs to send test transactions to both endpoints to compare their accuracy before promoting the winner to production.

Sample Permissions:

[
  {
    "Sid": "SageMakerReadOnlyAccess",
    "Effect": "Allow",
    "Action": [
      "sagemaker:ListEndpoints",
      "sagemaker:DescribeEndpoint",
      "sagemaker:ListEndpointConfigs",
      "sagemaker:DescribeEndpointConfig",
      "sagemaker:ListModels",
      "sagemaker:DescribeModel",
      "sagemaker:DescribeModelPackage",
      "sagemaker:ListModelPackages"
      "sagemaker:GetSearchSuggestions"
    ],
    "Resource": "*"
  },
  {
    "Sid": "SageMakerInvokeDevStagingEndpoints",
    "Effect": "Allow",
    "Action": [
      "sagemaker:InvokeEndpoint",
      "sagemaker:InvokeEndpointAsync"
    ],
    "Resource": [
      "arn:aws:sagemaker:*:*:endpoint/*-sandbox-*",
      "arn:aws:sagemaker:*:*:endpoint/*-dev-*",
      "arn:aws:sagemaker:*:*:endpoint/*-staging-*"
    ]
  },
  {
    "Sid": "EndpointLifecycleNonProd",
    "Effect": "Allow",
    "Action": [
      "sagemaker:CreateEndpoint",
      "sagemaker:CreateEndpointConfig",
      "sagemaker:UpdateEndpoint",
      "sagemaker:DeleteEndpoint",
      "sagemaker:DeleteEndpointConfig"
    ],
    "Resource": [
      "arn:aws:sagemaker:*:*:endpoint/*-sandbox-*",
      "arn:aws:sagemaker:*:*:endpoint/*-dev-*",
      "arn:aws:sagemaker:*:*:endpoint/*-staging-*",
      "arn:aws:sagemaker:*:*:endpoint-config/*-sandbox-*",
      "arn:aws:sagemaker:*:*:endpoint-config/*-dev-*",
      "arn:aws:sagemaker:*:*:endpoint-config/*-staging-*"
    ]
  },
  {
    "Sid": "ModelRegistration",
    "Effect": "Allow",
    "Action": [
      "sagemaker:CreateModel",
      "sagemaker:CreateModelPackage",
      "sagemaker:CreateModelPackageGroup"
    ],
    "Resource": "*"
  },
  {
    "Sid": "ExplicitDenyProductionAndLifecycle",
    "Effect": "Deny",
    "Action": [
      "sagemaker:CreateEndpoint",
      "sagemaker:CreateEndpointConfig",
      "sagemaker:UpdateEndpoint",
      "sagemaker:DeleteEndpoint",
      "sagemaker:DeleteEndpointConfig"
    ],
    "Resource": [
      "arn:aws:sagemaker:*:*:endpoint/*-prod-*",
      "arn:aws:sagemaker:*:*:endpoint-config/*-prod-*"
    ]
  }
]

"Sid": "SageMakerReadOnlyAccess"

Grants broad permission to view and list SageMaker resources and search suggestions across the entire account.

Policy Action	Description
sagemaker:Describe	Retrieves detailed information about a resource (e.g., training jobs, models, or endpoints).
sagemaker:List*	Lists resources of a specific type to see what exists in the environment.
sagemaker:GetSearchSuggestions	Provides auto-complete suggestions for SageMaker search queries.

"Sid": "SageMakerInvokeDevStagingEndpoints"

Allows the user to send data to specific SageMaker endpoints named as “sandbox”, “dev”, or “staging.”

Policy Action	Description
sagemaker:InvokeEndpoint	Sends a real-time inference request to a running endpoint to get a prediction.
sagemaker:InvokeEndpointAsync	Sends an inference request to an asynchronous endpoint (used for large payloads or long processing times).

"Sid": "EndpointLifecycleNonProd"

Allows creating, updating, and deleting endpoints and endpoint configurations in non-production environments (sandbox, dev, staging). This enables ML engineers and data scientists to test real-time inference latency, validate inference logic, and iterate on endpoint configurations before handing off to MLOps for production deployment.

Policy Action	Description
sagemaker:CreateEndpoint	Creates a new endpoint using a specific endpoint configuration. Scoped to sandbox/dev/staging naming patterns.
sagemaker:CreateEndpointConfig	Defines the hardware (instance type, count) and model specifications for an endpoint. Scoped to non-production.
sagemaker:UpdateEndpoint	Deploys a new model or configuration to an existing non-production endpoint.
sagemaker:DeleteEndpoint	Shuts down and removes a non-production endpoint to stop incurring costs.
sagemaker:DeleteEndpointConfig	Removes an endpoint configuration that is no longer needed in non-production.

"Sid": "ModelRegistration"

Allows data scientists to register trained models in the SageMaker Model Registry after training. This is a development activity — the model sits in the registry awaiting approval from MLOps or governance teams before production deployment.

Policy Action	Description
sagemaker:CreateModel	Creates a model definition in SageMaker by specifying the Docker container image, model artifacts (from S3), and inference code. This does not deploy the model — it only defines it.
sagemaker:CreateModelPackage	Registers a versioned model package in the Model Registry. This is the primary action for submitting a trained model for review and approval.
sagemaker:CreateModelPackageGroup	Creates a model package group to organize related model versions (e.g., all versions of a fraud detection model). Typically done once per model project.

"Sid": "ExplicitDenyProductionAndLifecycle"

A strict guardrail that blocks any interaction with production (*-prod-*) endpoints and their endpoint configurations. Non-production environments (sandbox, dev, staging) are allowed.

Policy Action Denied	Description
sagemaker:CreateEndpoint	(Denied) Creates a new endpoint using a specific endpoint configuration.
sagemaker:CreateEndpointConfig	(Denied) Defines the hardware and model specifications for an endpoint.
sagemaker:UpdateEndpoint	(Denied) Deploys a new model or configuration to an existing endpoint.
sagemaker:DeleteEndpoint	(Denied) Shuts down and removes a production endpoint to stop incurring costs.
sagemaker:DeleteEndpointConfig	(Denied) Removes a production endpoint configuration.

Python Example:

import boto3
import json

runtime = boto3.client('sagemaker-runtime')

# Test staging endpoint
response = runtime.invoke_endpoint(
    EndpointName='acme-dev-fraud-detection-v2',
    ContentType='application/json',
    Body=json.dumps({
        'transaction_amount': 1500.00,
        'merchant_category': 'electronics'
    })
)

prediction = json.loads(response['Body'].read())
print(f"Fraud probability: {prediction['fraud_score']}")

Safety Feature: Cannot accidentally invoke production endpoints during testing - prevents costly mistakes and data contamination.

Level 3: prod-invoke¶

Purpose: Production applications invoking production models

Principal: Machine (backend services, APIs) or Human (production support)

Typical Users:

Backend API services
Production web applications
Mobile app backends
Real-time fraud detection systems
Customer-facing chatbots

How It Differs from dev-invoke (Level 2):

Switches environment scope — production endpoints only, no dev/staging
Typically assigned to service roles — production apps use IAM roles, not user credentials
Higher accountability — every invocation serves real customers

What You Can Do:

✅ Everything in read-only, PLUS:
✅ Invoke production endpoints only
✅ Send real customer prediction requests
✅ Receive model responses for business logic

What You Cannot Do:

❌ Invoke dev or staging endpoints
❌ Create or modify endpoints
❌ Delete endpoints

Example Scenario:

The fraud detection API service receives transaction requests from the payment gateway. For each transaction, it calls the production fraud model endpoint and blocks transactions with fraud scores above 0.85.

Sample Permissions:

[
  {
    "Sid": "AllowProductionEndpointDiscovery",
    "Effect": "Allow",
    "Action": [
      "sagemaker:DescribeEndpoint",
      "sagemaker:ListEndpoints"
    ],
    "Resource": "*"
  },
  {
    "Sid": "AllowProductionModelInvocation",
    "Effect": "Allow",
    "Action": [
      "sagemaker:InvokeEndpoint",
      "sagemaker:InvokeEndpointAsync"
    ],
    "Resource": "arn:aws:sagemaker:*:*:endpoint/*-prod-*"
  },
  {
    "Sid": "DenyNonProductionInvocation",
    "Effect": "Deny",
    "Action": [
      "sagemaker:InvokeEndpoint",
      "sagemaker:InvokeEndpointAsync"
    ],
    "NotResource": "arn:aws:sagemaker:*:*:endpoint/*-prod-*"
  },
  {
    "Sid": "DenyEndpointModifications",
    "Effect": "Deny",
    "Action": [
      "sagemaker:CreateEndpoint",
      "sagemaker:UpdateEndpoint",
      "sagemaker:DeleteEndpoint",
      "sagemaker:CreateEndpointConfig",
      "sagemaker:DeleteEndpointConfig"
    ],
    "Resource": "*"
  }
]

"Sid": "AllowProductionEndpointDiscovery"

Enables the principal to find and view the status of endpoints that follow the -prod- naming convention.

Policy Action	Description
sagemaker:DescribeEndpoint	Returns detailed information about an endpoint, such as its current status and configuration name.
sagemaker:ListEndpoints	Lists the SageMaker endpoints in the account, allowing the user to see what is available.

"Sid": "AllowProductionModelInvocation"

Grants the core permission to send data to and receive predictions from production-ready models.

Policy Action	Description
sagemaker:InvokeEndpoint	Sends a synchronous request to an endpoint for low-latency, real-time machine learning inferences.
sagemaker:InvokeEndpointAsync	Sends an inference request to an asynchronous endpoint, suitable for large payloads or long processing times.

"Sid": "DenyNonProductionInvocation"

A guardrail that explicitly prevents this principal from hitting any endpoint not explicitly tagged or named as “prod,” preventing accidental cross-environment leakage.

Policy Action Denied	Description
sagemaker:InvokeEndpoint	(Denied) Sends a synchronous request to an endpoint for low-latency, real-time machine learning inferences.
sagemaker:InvokeEndpointAsync	(Denied) Sends an inference request to an asynchronous endpoint, suitable for large payloads or long processing times.

"Sid": "DenyEndpointModifications"

Ensures the principal cannot change the infrastructure, such as deleting models or scaling configs, maintaining environment stability.

Policy Action Denied	Description
sagemaker:CreateEndpoint	(Denied) Creates a new SageMaker endpoint using a specific endpoint configuration.
sagemaker:UpdateEndpoint	(Denied) Deploys a new endpoint configuration to an existing endpoint without taking it offline.
sagemaker:DeleteEndpoint	(Denied) Permanently removes an existing SageMaker endpoint and stops the associated hosting instances.
sagemaker:CreateEndpointConfig	(Denied) Defines a setup for an endpoint, specifying which models to deploy and the hardware instance types to use.
sagemaker:DeleteEndpointConfig	(Denied) Deletes a previously created endpoint configuration.

API Gateway Integration:

import boto3
import json
from flask import Flask, request, jsonify

app = Flask(__name__)
runtime = boto3.client('sagemaker-runtime')

@app.route('/check-fraud', methods=['POST'])
def check_fraud():
    transaction = request.json
    
    # Call production endpoint
    response = runtime.invoke_endpoint(
        EndpointName='acme-prod-fraud-detection',
        ContentType='application/json',
        Body=json.dumps(transaction)
    )
    
    prediction = json.loads(response['Body'].read())
    
    return jsonify({
        'transaction_id': transaction['id'],
        'fraud_score': prediction['fraud_score'],
        'action': 'block' if prediction['fraud_score'] > 0.85 else 'approve'
    })

Security Benefit: Production applications cannot call unstable dev/staging endpoints - ensures reliability and data integrity.

Level 4: full¶

Purpose: Complete endpoint lifecycle management across all environments

Principal: Human (MLOps engineers, platform admins)

Typical Users:

MLOps engineers
Platform administrators
Deployment automation (CI/CD)
Infrastructure team

How It Differs from prod-invoke (Level 3):

Adds lifecycle management — create, update, configure, and delete endpoints
Adds autoscaling control — configure scaling policies and instance counts
Account-wide scope — Resource: * across all environments
Includes delete — can decommission obsolete endpoints

What You Can Do:

✅ Everything in prod-invoke, PLUS:
✅ Create new endpoints
✅ Update endpoint configurations
✅ Deploy new model versions
✅ Configure autoscaling policies
✅ Delete obsolete endpoints
✅ Manage endpoint tags

What You Cannot Do:

❌ Nothing - this is full endpoint management

Example Scenario:

The MLOps team needs to deploy a new fraud detection model to production. They create an endpoint configuration with the new model, create the endpoint with 2 instances, enable autoscaling, and gradually shift traffic from the old endpoint using blue/green deployment.

Sample Permissions:

[
  {
    "Sid": "SageMakerEndpointLifecycleManagement",
    "Effect": "Allow",
    "Action": [
      "sagemaker:CreateEndpoint",
      "sagemaker:CreateEndpointConfig",
      "sagemaker:UpdateEndpoint",
      "sagemaker:UpdateEndpointWeightsAndCapacities",
      "sagemaker:DeleteEndpoint",
      "sagemaker:DeleteEndpointConfig",
      "sagemaker:DescribeEndpoint",
      "sagemaker:DescribeEndpointConfig",
      "sagemaker:ListEndpoints",
      "sagemaker:ListEndpointConfigs"
    ],
    "Resource": "*"
  },
  {
    "Sid": "ModelManagement",
    "Effect": "Allow",
    "Action": [
      "sagemaker:CreateModel",
      "sagemaker:DescribeModel",
      "sagemaker:DeleteModel",
      "sagemaker:ListModels"
    ],
    "Resource": "*"
  },
  {
    "Sid": "InferenceExecution",
    "Effect": "Allow",
    "Action": [
      "sagemaker:InvokeEndpoint",
      "sagemaker:InvokeEndpointAsync"
    ],
    "Resource": "*"
  },
  {
    "Sid": "AutoscalingAndMonitoring",
    "Effect": "Allow",
    "Action": [
      "application-autoscaling:RegisterScalableTarget",
      "application-autoscaling:DeregisterScalableTarget",
      "application-autoscaling:PutScalingPolicy",
      "application-autoscaling:DeleteScalingPolicy",
      "application-autoscaling:DescribeScalableTargets",
      "application-autoscaling:DescribeScalingPolicies",
      "cloudwatch:PutMetricAlarm",
      "cloudwatch:DescribeAlarms",
      "cloudwatch:DeleteAlarms"
    ],
    "Resource": "*"
  },
  {
    "Sid": "TaggingAndMetadata",
    "Effect": "Allow",
    "Action": [
      "sagemaker:AddTags",
      "sagemaker:DeleteTags",
      "sagemaker:ListTags"
    ],
    "Resource": "*"
  },
  {
    "Sid": "PassRoleToSageMaker",
    "Effect": "Allow",
    "Action": [
      "iam:PassRole"
    ],
    "Resource": "arn:aws:iam::*:role/{company_prefix}-{env}-*-role-*",
    "Condition": {
      "StringEquals": {
        "iam:PassedToService": "sagemaker.amazonaws.com"
      }
    }
  },
  {
    "Sid": "ExplicitDenyDataDeletion",
    "Effect": "Deny",
    "Action": [
      "sagemaker:DeleteDomain",
      "sagemaker:DeleteUserProfile"
    ],
    "Resource": "*"
  }
]

"Sid": "SageMakerEndpointLifecycleManagement"

Controls the core lifecycle of hosting, including creating, updating, and deleting the physical endpoints and their configurations. Also includes discovery actions (List/Describe) so admins have full visibility.

Policy Action	Description
sagemaker:CreateEndpoint	Launches the actual HTTPS endpoint based on a specific configuration. Once this action completes, the endpoint is “InService” and ready to process inference requests.
sagemaker:CreateEndpointConfig	Defines the configuration for a model deployment. It acts as a “blueprint” that specifies exactly how SageMaker should host your machine learning model before you actually create the live endpoint.
sagemaker:UpdateEndpoint	Switches an endpoint to a new configuration (e.g., rolling out a new model version). This is typically used for “Blue/Green” deployments to swap a model version or change instance types without downtime.
sagemaker:UpdateEndpointWeightsAndCapacities	Dynamically adjusts the traffic distribution and instance counts of models (production variants) hosted on an active endpoint. Unlike UpdateEndpoint, which often involves deploying a new configuration and can trigger a rolling update, this operation allows you to make “in-place” adjustments to existing variants without changing the underlying Endpoint Config.
sagemaker:DeleteEndpoint	Shuts down the hosted infrastructure (ENDPOINT) and stops incurring charges. This action does not delete the configuration or the models themselves.
sagemaker:DeleteEndpointConfig	Permanently removes the specified endpoint configuration blueprint. You cannot delete a configuration that is currently being used by a live or updating endpoint.
sagemaker:DescribeEndpoint	Views the current status and details of a live SageMaker endpoint.
sagemaker:DescribeEndpointConfig	Retrieves the specific settings defined in an endpoint configuration.
sagemaker:ListEndpoints	Lists all endpoints in the account for platform-wide visibility.
sagemaker:ListEndpointConfigs	Lists all endpoint configurations to browse existing blueprints.

"Sid": "ModelManagement"

Allows the definition of the software/model artifacts that the endpoints will run.

Policy Action	Description
sagemaker:CreateModel	Grants permission to create a model in SageMaker. This process involves naming the model and specifying the Docker container image, model artifacts (usually from S3), and inference code required for deployment.
sagemaker:DescribeModel	Grants permission to view the details of a specific model. This returns information about the model’s configuration, such as the primary container, execution role, and creation time.
sagemaker:DeleteModel	Grants permission to delete a model resource. This action only removes the model entry in SageMaker; it does not delete the underlying model artifacts in S3 or the associated IAM roles.
sagemaker:ListModels	Lists all models in the account. Admins need this to discover and audit models before associating them with endpoint configurations.

"Sid": "InferenceExecution"

Grants the ability to send data to endpoints across all environments. At Level 4, there is no environment restriction — admins need to invoke any endpoint for testing, validation, and troubleshooting.

Policy Action	Description
sagemaker:InvokeEndpoint	Sends data to a real-time endpoint for a prediction/inference response.
sagemaker:InvokeEndpointAsync	Sends data to an asynchronous endpoint for inference. Unlike a real-time request, the model processes the data in the background and saves the prediction result to an S3 bucket rather than returning it immediately.

"Sid": "AutoscalingAndMonitoring"

Manages the horizontal scaling rules (adding/removing instances) based on traffic demand.

Policy Action	Description
application-autoscaling:RegisterScalableTarget	Registers an AWS or custom resource as a scalable target, allowing Application Auto Scaling to manage it. It also sets or updates the minimum and maximum capacity limits.
application-autoscaling:DeregisterScalableTarget	Removes a resource from being a scalable target. This action also deletes all associated scaling policies and scheduled actions for that resource.
application-autoscaling:PutScalingPolicy	Creates or updates a scaling policy (target tracking, step scaling, or predictive) for a registered scalable target to automate capacity adjustments.
application-autoscaling:DeleteScalingPolicy	Deletes a specific scaling policy. For target tracking, it also removes the CloudWatch alarms created on your behalf; for step scaling, it deletes the alarm action but not the alarm itself.
application-autoscaling:DescribeScalableTargets	Retrieves detailed information about one or more scalable targets in a specified service namespace, including their current capacity limits.
application-autoscaling:DescribeScalingPolicies	Returns information about the scaling policies for the specified service namespace and scalable targets.
cloudwatch:PutMetricAlarm	Creates or updates an alarm and associates it with a specific metric. In an autoscaling context, these alarms trigger the scaling policies when thresholds are breached.
cloudwatch:DescribeAlarms	Retrieves information about specified alarms. It is often used to verify the status or configuration of alarms used by autoscaling policies.
cloudwatch:DeleteAlarms	Deletes the specified alarms. This is used during cleanup to ensure that unused CloudWatch alarms are removed after a scaling policy or resource is deleted.

"Sid": "TaggingAndMetadata"

Enables resource organization, cost tracking, and access control via metadata tags.

Policy Action	Description
sagemaker:AddTags	Grants permission to add or overwrite one or more tags for a specified SageMaker resource (e.g., notebook instances, models, or training jobs).
sagemaker:DeleteTags	Grants permission to remove one or more specific tags from a SageMaker resource.
sagemaker:ListTags	Grants permission to view/list all tags currently associated with a specific SageMaker resource.

"Sid": "PassRoleToSageMaker"

Required for CreateModel and CreateEndpoint — SageMaker needs an execution role to pull model artifacts from S3 and write logs. Scoped to roles matching the platform’s naming convention and conditioned to SageMaker only, preventing privilege escalation to other services.

Policy Action	Description
iam:PassRole	Assign IAM execution roles to SageMaker models and endpoints. Scoped to `{company_prefix}-{env}--role-` and conditioned to `sagemaker.amazonaws.com` only.

"Sid": "ExplicitDenyDataDeletion"

Safety net to protect the SageMaker Studio environment itself. Even with full endpoint management, admins should not accidentally destroy the shared platform infrastructure. An explicit Deny ensures no other policy can override this protection.

Policy Action Denied	Description
sagemaker:DeleteDomain	Prevents the accidental deletion of the entire SageMaker Studio environment, which includes all user settings and shared resources.
sagemaker:DeleteUserProfile	Prevents the deletion of individual user profiles. Deleting a profile causes the user to lose access to their associated data, notebooks, and artifacts stored in their EFS volume.

Deployment Script:

import boto3

sagemaker = boto3.client('sagemaker')

# Create endpoint configuration
config_name = 'fraud-detection-v3-config'
sagemaker.create_endpoint_config(
    EndpointConfigName=config_name,
    ProductionVariants=[{
        'VariantName': 'AllTraffic',
        'ModelName': 'fraud-detection-v3',
        'InitialInstanceCount': 2,
        'InstanceType': 'ml.m5.xlarge'
    }]
)

# Create endpoint
endpoint_name = 'acme-prod-fraud-detection'
sagemaker.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=config_name,
    Tags=[
        {'Key': 'Environment', 'Value': 'production'},
        {'Key': 'Model', 'Value': 'fraud-detection'},
        {'Key': 'Version', 'Value': 'v3'}
    ]
)

print(f"Endpoint {endpoint_name} created successfully")

Security Note: ⚠️ This level should be assigned sparingly. Most users need dev-invoke or prod-invoke.

Level 4-ci: deploy-only¶

Purpose: Automated deployment of endpoints and models without destructive or traffic-shifting actions

Principal: Machine (CI/CD pipelines, deployment automation)

Typical Users:

CI/CD deployment pipelines (CodePipeline, GitHub Actions, GitLab CI)
Automated model deployment workflows
Infrastructure-as-Code automation (CloudFormation, CDK)

How It Differs from full (Level 4):

No delete actions — pipelines deploy forward, never tear down
No traffic weight shifting — canary/blue-green traffic decisions are a separate human or canary-pipeline concern
Same create/update/invoke scope — full deployment capability across all environments
Machine identity only — assigned to service roles, never to human users

What You Can Do:

✅ Everything in full, EXCEPT:
✅ Create new endpoints and endpoint configurations
✅ Update existing endpoints to new configurations
✅ Register models for deployment
✅ Invoke endpoints across all environments (smoke tests)
✅ Configure autoscaling policies
✅ Tag resources with deployment metadata
✅ Pass execution roles to SageMaker

What You Cannot Do:

❌ Delete endpoints, endpoint configurations, or models
❌ Shift traffic weights between production variants
❌ Delete SageMaker domains or user profiles

Example Scenario:

The CI/CD pipeline receives a merged PR that triggers a model deployment. It creates a new endpoint configuration with the updated model artifact, updates the production endpoint to use the new configuration, configures autoscaling, and runs a smoke test by invoking the endpoint. It cannot delete the old endpoint — that’s a separate cleanup job requiring human approval.

Sample Permissions:

[
  {
    "Sid": "SageMakerEndpointDeployment",
    "Effect": "Allow",
    "Action": [
      "sagemaker:CreateEndpoint",
      "sagemaker:CreateEndpointConfig",
      "sagemaker:UpdateEndpoint",
      "sagemaker:DescribeEndpoint",
      "sagemaker:DescribeEndpointConfig",
      "sagemaker:ListEndpoints",
      "sagemaker:ListEndpointConfigs"
    ],
    "Resource": "*"
  },
  {
    "Sid": "ModelRegistration",
    "Effect": "Allow",
    "Action": [
      "sagemaker:CreateModel",
      "sagemaker:CreateModelPackage",
      "sagemaker:CreateModelPackageGroup",
      "sagemaker:DescribeModel",
      "sagemaker:DescribeModelPackage",
      "sagemaker:DescribeModelPackageGroup",
      "sagemaker:ListModels",
      "sagemaker:ListModelPackages",
      "sagemaker:ListModelPackageGroups",
      "sagemaker:UpdateModelPackage"
    ],
    "Resource": "*"
  },
  {
    "Sid": "InferenceExecution",
    "Effect": "Allow",
    "Action": [
      "sagemaker:InvokeEndpoint",
      "sagemaker:InvokeEndpointAsync"
    ],
    "Resource": "*"
  },
  {
    "Sid": "AutoscalingConfiguration",
    "Effect": "Allow",
    "Action": [
      "application-autoscaling:RegisterScalableTarget",
      "application-autoscaling:PutScalingPolicy",
      "application-autoscaling:DescribeScalableTargets",
      "application-autoscaling:DescribeScalingPolicies",
      "cloudwatch:PutMetricAlarm",
      "cloudwatch:DescribeAlarms"
    ],
    "Resource": "*"
  },
  {
    "Sid": "TaggingAndMetadata",
    "Effect": "Allow",
    "Action": [
      "sagemaker:AddTags",
      "sagemaker:ListTags"
    ],
    "Resource": "*"
  },
  {
    "Sid": "PassRoleToSageMaker",
    "Effect": "Allow",
    "Action": [
      "iam:PassRole"
    ],
    "Resource": "arn:aws:iam::*:role/{company_prefix}-{env}-*-role-*",
    "Condition": {
      "StringEquals": {
        "iam:PassedToService": "sagemaker.amazonaws.com"
      }
    }
  },
  {
    "Sid": "DenyDestructiveActions",
    "Effect": "Deny",
    "Action": [
      "sagemaker:DeleteEndpoint",
      "sagemaker:DeleteEndpointConfig",
      "sagemaker:DeleteModel",
      "sagemaker:DeleteModelPackage",
      "sagemaker:DeleteModelPackageGroup",
      "sagemaker:UpdateEndpointWeightsAndCapacities",
      "sagemaker:DeleteDomain",
      "sagemaker:DeleteUserProfile"
    ],
    "Resource": "*"
  }
]

"Sid": "SageMakerEndpointDeployment"

Grants the core deployment actions: create new endpoints and configurations, update existing endpoints to new configurations, and discover/inspect all endpoints. Excludes delete actions — teardown is not a CI/CD pipeline responsibility.

Policy Action	Description
sagemaker:CreateEndpoint	Launches a new HTTPS endpoint based on a specific configuration.
sagemaker:CreateEndpointConfig	Defines the deployment blueprint specifying model, instance type, and variant configuration.
sagemaker:UpdateEndpoint	Switches an endpoint to a new configuration for rolling deployments and model version updates.
sagemaker:DescribeEndpoint	Returns detailed information about an endpoint’s current status and configuration.
sagemaker:DescribeEndpointConfig	Retrieves the settings defined in an endpoint configuration.
sagemaker:ListEndpoints	Lists all endpoints in the account for deployment verification.
sagemaker:ListEndpointConfigs	Lists all endpoint configurations for blueprint discovery.

"Sid": "ModelRegistration"

Allows the pipeline to register model artifacts and manage model packages in the SageMaker Model Registry. Includes UpdateModelPackage for automated approval workflows.

Policy Action	Description
sagemaker:CreateModel	Creates a model resource pointing to the container image and S3 model artifacts.
sagemaker:CreateModelPackage	Registers a model version in a model package group for versioned tracking.
sagemaker:CreateModelPackageGroup	Creates a new model package group to organize model versions.
sagemaker:DescribeModel	Views details of a specific model resource.
sagemaker:DescribeModelPackage	Views details of a specific model package version.
sagemaker:DescribeModelPackageGroup	Views details of a model package group.
sagemaker:ListModels	Lists all models in the account.
sagemaker:ListModelPackages	Lists model package versions within a group.
sagemaker:ListModelPackageGroups	Lists all model package groups.
sagemaker:UpdateModelPackage	Updates model package metadata, including approval status for automated promotion workflows.

"Sid": "InferenceExecution"

Grants invoke access across all environments for post-deployment smoke tests and health checks.

Policy Action	Description
sagemaker:InvokeEndpoint	Sends a synchronous request to an endpoint for real-time inference.
sagemaker:InvokeEndpointAsync	Sends an asynchronous inference request for large payloads or long processing.

"Sid": "AutoscalingConfiguration"

Allows the pipeline to configure autoscaling after deployment. Excludes DeregisterScalableTarget and DeleteScalingPolicy — scaling teardown is a destructive action.

Policy Action	Description
application-autoscaling:RegisterScalableTarget	Registers an endpoint variant as a scalable target with min/max capacity.
application-autoscaling:PutScalingPolicy	Creates or updates a scaling policy (target tracking, step, or predictive).
application-autoscaling:DescribeScalableTargets	Retrieves information about registered scalable targets.
application-autoscaling:DescribeScalingPolicies	Returns information about scaling policies for verification.
cloudwatch:PutMetricAlarm	Creates alarms that trigger scaling policies when thresholds are breached.
cloudwatch:DescribeAlarms	Retrieves alarm status for deployment verification.

"Sid": "TaggingAndMetadata"

Allows the pipeline to tag deployed resources with deployment metadata (commit hash, pipeline run ID, version). Excludes DeleteTags — tag cleanup is not a deployment concern.

Policy Action	Description
sagemaker:AddTags	Adds or overwrites tags on SageMaker resources for tracking and cost allocation.
sagemaker:ListTags	Lists tags on a resource for verification after tagging.

"Sid": "PassRoleToSageMaker"

Required for CreateModel and CreateEndpoint — SageMaker needs an execution role to pull model artifacts from S3 and write logs. Scoped to roles matching the platform’s naming convention and conditioned to SageMaker only.

Policy Action	Description
iam:PassRole	Assigns IAM execution roles to SageMaker models and endpoints. Scoped to `{company_prefix}-{env}--role-` and conditioned to `sagemaker.amazonaws.com` only.

"Sid": "DenyDestructiveActions"

Explicit deny on all destructive and traffic-shifting actions. This is the core guardrail that differentiates level4-ci from level4. CI/CD pipelines deploy forward — teardown and traffic shifting require separate authorization.

Policy Action Denied	Description
sagemaker:DeleteEndpoint	(Denied) Prevents pipeline from removing production endpoints.
sagemaker:DeleteEndpointConfig	(Denied) Prevents pipeline from removing endpoint configuration blueprints.
sagemaker:DeleteModel	(Denied) Prevents pipeline from removing model resources.
sagemaker:DeleteModelPackage	(Denied) Prevents pipeline from removing model package versions.
sagemaker:DeleteModelPackageGroup	(Denied) Prevents pipeline from removing model package groups.
sagemaker:UpdateEndpointWeightsAndCapacities	(Denied) Prevents pipeline from shifting traffic between production variants. Traffic decisions should be a separate human or canary-pipeline concern.
sagemaker:DeleteDomain	(Denied) Prevents accidental deletion of the SageMaker Studio environment.
sagemaker:DeleteUserProfile	(Denied) Prevents deletion of individual user profiles and their associated data.

Security Note: ⚠️ This level is designed exclusively for machine identities (service roles). Never assign to human users — humans who need full SageMaker access should use level4 (full).

Lambda Inference¶

Lambda Inference policies control access to Lambda functions that serve ML models for predictions. Lambda is ideal for lightweight, event-driven inference workloads where cold start latency is acceptable and cost optimization is a priority.

Level 1: invoke-only¶

Purpose: Call Lambda inference functions without managing them

Principal: Machine (backend services, API Gateway) or Human (developers testing)

Typical Users:

Backend API services calling model endpoints
API Gateway integrations
Event-driven architectures (S3 triggers, SQS consumers)
Developers testing inference locally

What You Can Do:

✅ Invoke Lambda inference functions
✅ View function configuration and metadata
✅ List available inference functions
✅ Check function status and last invocation

What You Cannot Do:

❌ Create or delete Lambda functions
❌ Modify function code or configuration
❌ Change memory, timeout, or environment variables
❌ Manage layers or aliases

Example Scenario:

An API Gateway route receives image classification requests from a mobile app. It invokes a Lambda function that loads a lightweight PyTorch model and returns the predicted label. The API service only needs invoke permission — it never modifies the function.

Sample Permissions:

[
  {
    "Sid": "LambdaDiscoveryListActions",
    "Effect": "Allow",
    "Action": [
      "lambda:ListFunctions"
    ],
    "Resource": "*"
  },
  {
    "Sid": "LambdaDiscoveryActions",
    "Effect": "Allow",
    "Action": [
      "lambda:ListAliases",
      "lambda:ListTags",
      "lambda:GetFunction",
      "lambda:GetFunctionConfiguration",
      "lambda:GetPolicy",
      "lambda:GetAlias",
      "lambda:GetFunctionUrlConfig",
      "lambda:ListFunctionUrlConfigs",
      "lambda:GetProvisionedConcurrencyConfig",
      "lambda:ListProvisionedConcurrencyConfigs"
    ],
    "Resource": "arn:aws:lambda:{region}:{account_id}:function:{company_prefix}-{env}-*"
  },
  {
    "Sid": "LambdaInvocationActions",
    "Effect": "Allow",
    "Action": [
      "lambda:InvokeFunction",
      "lambda:InvokeFunctionUrl",
      "lambda:GetFunctionEventInvokeConfig",
      "lambda:ListFunctionEventInvokeConfigs",
      "lambda:GetFunctionConcurrency"
    ],
    "Resource": "arn:aws:lambda:{region}:{account_id}:function:{company_prefix}-{env}-*"
  }
]

"Sid": "LambdaDiscoveryListActions"

Grants permissions to list Lambda functions, their aliases, and associated tags. This allows users to discover available inference endpoints and understand their organization without accessing sensitive configuration details.

Policy Action	Description
lambda:ListFunctions	Retrieves a list of all Lambda functions in the region to identify inference endpoints.

"Sid": "LambdaDiscoveryActions"

Provides permissions to view detailed information about Lambda functions, including their configuration, access policies, aliases, URL configurations, and concurrency settings. This allows users to understand the capabilities and status of inference functions without modifying them.

Policy Action	Description
lambda:ListAliases	Lists all aliases for a specific function to find different deployment versions.
lambda:ListTags	Lists tags assigned to the function for resource filtering and organization.
lambda:GetFunction	Returns the configuration and a pre-signed URL to download the deployment package.
lambda:GetFunctionConfiguration	Provides specific metadata like runtime, handler, and environment variables.
lambda:GetPolicy	Retrieves the resource-based policy to verify access permissions.
lambda:GetAlias	Retrieves information about a specific function alias (e.g., ‘prod’ or ‘staging’).
lambda:GetFunctionUrlConfig	Returns the URL configuration for functions used as direct HTTP(S) endpoints.
lambda:ListFunctionUrlConfigs	Lists all URL configurations associated with a function.
lambda:GetProvisionedConcurrencyConfig	Retrieve the status and details of the Provisioned Concurrency setup for a specific function version or alias.
lambda:ListProvisionedConcurrencyConfigs	Lists all provisioned concurrency configurations for a function to assess scaling readiness.

"Sid": "LambdaInvocationActions"

Grants permissions to execute Lambda functions and manage asynchronous execution flows. This allows users to invoke inference functions for predictions while still preventing any modifications to the function code or configuration.

Policy Action	Description
lambda:InvokeFunction	The primary action for synchronous or asynchronous execution of the inference code.
lambda:InvokeFunctionUrl	Enables execution via the built-in Lambda HTTP(S) endpoint.
lambda:GetFunctionEventInvokeConfig	Retrieves configuration for asynchronous delivery, such as destination and retry attempts.
lambda:ListFunctionEventInvokeConfigs	Lists all asynchronous invocation configurations for the function.
lambda:GetFunctionConcurrency	Allows checking if the function has enough reserved capacity to handle the expected inference load.

Level 2: deploy-manage¶

Purpose: Deploy and configure Lambda inference functions

Principal: Human (ML engineers, DevOps) or Machine (CI/CD pipelines)

Typical Users:

ML engineers packaging models into Lambda functions
DevOps engineers configuring memory, timeout, and concurrency
CI/CD pipelines deploying new model versions
Data scientists publishing lightweight models

How It Differs from invoke-only (Level 1):

Adds deployment — create, update, and publish function versions
Adds configuration — modify memory, timeout, environment variables, layers
Adds alias management — create aliases for blue/green and canary deployments
Still no delete — function removal requires Level 3

What You Can Do:

✅ Everything in invoke-only, PLUS:
✅ Create new Lambda inference functions
✅ Update function code with new model versions
✅ Configure memory, timeout, and concurrency settings
✅ Manage function aliases for traffic shifting
✅ Add and update Lambda layers (model dependencies)
✅ Set environment variables (model paths, feature flags)

What You Cannot Do:

❌ Delete Lambda functions
❌ Modify IAM execution roles
❌ Change VPC or security group settings

Example Scenario:

An ML engineer has retrained the image classification model and needs to deploy the new version. They update the Lambda function code, publish a new version, and shift 10% of traffic to the new version via a weighted alias — all without touching the production alias until validation passes.

Sample Permissions:

[
  {
    "Sid": "LambdaGlobalDiscovery",
    "Effect": "Allow",
    "Action": [
      "lambda:GetAccountSettings",
      "lambda:ListFunctions",
      "lambda:ListLayers",
      "lambda:ListLayerVersions",
      "lambda:ListCodeSigningConfigs",
      "lambda:ListEventSourceMappings"
    ],
    "Resource": "*"
  },
  {
    "Sid": "LambdaFunctionDiscovery",
    "Effect": "Allow",
    "Action": [
      "lambda:GetAlias",
      "lambda:GetFunction",
      "lambda:GetFunctionCodeSigningConfig",
      "lambda:GetFunctionConcurrency",
      "lambda:GetFunctionConfiguration",
      "lambda:GetFunctionEventInvokeConfig",
      "lambda:GetFunctionUrlConfig",
      "lambda:GetPolicy",
      "lambda:GetProvisionedConcurrencyConfig",
      "lambda:GetRuntimeManagementConfig",
      "lambda:ListAliases",
      "lambda:ListFunctionEventInvokeConfigs",
      "lambda:ListFunctionUrlConfigs",
      "lambda:ListProvisionedConcurrencyConfigs",
      "lambda:ListTags",
      "lambda:ListVersionsByFunction"
    ],
    "Resource": "arn:aws:lambda:{region}:{account_id}:function:{company_prefix}-{env}-*"
  },
  {
    "Sid": "LambdaLayerDiscovery",
    "Effect": "Allow",
    "Action": [
      "lambda:GetLayerVersion",
      "lambda:GetLayerVersionPolicy"
    ],
    "Resource": "arn:aws:lambda:{region}:{account_id}:layer:{company_prefix}-{env}-*"
  },
  {
    "Sid": "LambdaInvocation",
    "Effect": "Allow",
    "Action": [
      "lambda:InvokeFunction",
      "lambda:InvokeFunctionUrl"
    ],
    "Resource": "arn:aws:lambda:{region}:{account_id}:function:{company_prefix}-{env}-*"
  },
  {
    "Sid": "LambdaDeploymentAndConfiguration",
    "Effect": "Allow",
    "Action": [
      "lambda:CreateFunction",
      "lambda:UpdateFunctionCode",
      "lambda:UpdateFunctionConfiguration",
      "lambda:PublishVersion",
      "lambda:CreateAlias",
      "lambda:UpdateAlias",
      "lambda:PutFunctionConcurrency",
      "lambda:PutFunctionEventInvokeConfig",
      "lambda:PutProvisionedConcurrencyConfig",
      "lambda:CreateFunctionUrlConfig",
      "lambda:UpdateFunctionUrlConfig",
      "lambda:TagResource"
    ],
    "Resource": "arn:aws:lambda:{region}:{account_id}:function:{company_prefix}-{env}-*"
  },
  {
    "Sid": "LambdaLayerManagement",
    "Effect": "Allow",
    "Action": [
      "lambda:PublishLayerVersion"
    ],
    "Resource": "arn:aws:lambda:{region}:{account_id}:layer:{company_prefix}-{env}-*"
  },
  {
    "Sid": "PassRoleToLambda",
    "Effect": "Allow",
    "Action": "iam:PassRole",
    "Resource": "arn:aws:iam::{account_id}:role/{company_prefix}-{env}-*-role-*",
    "Condition": {
      "StringEquals": {
        "iam:PassedToService": "lambda.amazonaws.com"
      }
    }
  },
  {
    "Sid": "DenyDeleteAndPermissionChanges",
    "Effect": "Deny",
    "Action": [
      "lambda:DeleteFunction",
      "lambda:DeleteAlias",
      "lambda:DeleteFunctionUrlConfig",
      "lambda:DeleteFunctionConcurrency",
      "lambda:DeleteFunctionEventInvokeConfig",
      "lambda:DeleteProvisionedConcurrencyConfig",
      "lambda:DeleteLayerVersion",
      "lambda:AddPermission",
      "lambda:RemovePermission"
    ],
    "Resource": "*"
  }
]

"Sid": "LambdaGlobalDiscovery"

Grants account-level and cross-function read access for actions that do not support resource-level scoping. These actions require Resource: * per the AWS Service Authorization Reference.

Policy Action	Description
lambda:GetAccountSettings	Returns account-level limits and usage such as concurrent execution quotas.
lambda:ListFunctions	Retrieves a list of all Lambda functions in the region.
lambda:ListLayers	Lists all Lambda layers available in the region.
lambda:ListLayerVersions	Lists published versions of a specific layer.
lambda:ListCodeSigningConfigs	Lists code signing configurations in the account.
lambda:ListEventSourceMappings	Lists event source mappings in the account.

"Sid": "LambdaFunctionDiscovery"

Grants read-only access to function-level metadata, configuration, aliases, concurrency settings, and URL configurations. All actions are scoped to tenant-prefixed functions.

Policy Action	Description
lambda:GetAlias	Returns details about a specific function alias.
lambda:GetFunction	Returns the function configuration and a pre-signed URL for the deployment package.
lambda:GetFunctionCodeSigningConfig	Returns the code signing config attached to a function.
lambda:GetFunctionConcurrency	Returns the reserved concurrency configuration for a function.
lambda:GetFunctionConfiguration	Returns version-specific settings such as runtime, handler, memory, and timeout.
lambda:GetFunctionEventInvokeConfig	Returns the asynchronous invocation configuration (retries, destinations).
lambda:GetFunctionUrlConfig	Returns the function URL configuration for direct HTTP(S) access.
lambda:GetPolicy	Returns the resource-based policy attached to the function.
lambda:GetProvisionedConcurrencyConfig	Returns the provisioned concurrency configuration for an alias or version.
lambda:GetRuntimeManagementConfig	Returns the runtime management configuration (auto or manual updates).
lambda:ListAliases	Lists all aliases for a specific function.
lambda:ListFunctionEventInvokeConfigs	Lists asynchronous invocation configurations for a function.
lambda:ListFunctionUrlConfigs	Lists URL configurations associated with a function.
lambda:ListProvisionedConcurrencyConfigs	Lists provisioned concurrency configurations for a function.
lambda:ListTags	Lists tags assigned to the function.
lambda:ListVersionsByFunction	Lists published versions of a function.

"Sid": "LambdaLayerDiscovery"

Grants read-only access to layer version details and policies. Layer actions require a layer ARN, not a function ARN, so they are scoped separately.

Policy Action	Description
lambda:GetLayerVersion	Returns details about a specific layer version, including the download URL.
lambda:GetLayerVersionPolicy	Returns the resource-based policy for a layer version.

"Sid": "LambdaInvocation"

Grants permission to execute Lambda inference functions via direct invocation or function URLs. Scoped to tenant-prefixed functions.

Policy Action	Description
lambda:InvokeFunction	Sends a synchronous or asynchronous request to execute the function.
lambda:InvokeFunctionUrl	Invokes the function via its built-in HTTP(S) endpoint.

"Sid": "LambdaDeploymentAndConfiguration"

Grants permissions to create functions, deploy new code versions, configure runtime settings, manage aliases for traffic shifting, and set concurrency. Does not include delete actions.

Policy Action	Description
lambda:CreateFunction	Creates a new Lambda function with the specified code and configuration.
lambda:UpdateFunctionCode	Deploys new code to an existing function (e.g., updated model artifact).
lambda:UpdateFunctionConfiguration	Modifies function settings such as memory, timeout, and environment variables.
lambda:PublishVersion	Creates an immutable snapshot of the current function code and configuration.
lambda:CreateAlias	Creates a named alias pointing to a function version for traffic routing.
lambda:UpdateAlias	Updates an alias to point to a different version or adjust traffic weights.
lambda:PutFunctionConcurrency	Sets reserved concurrency to guarantee execution capacity.
lambda:PutFunctionEventInvokeConfig	Configures asynchronous invocation settings (retries, destinations).
lambda:PutProvisionedConcurrencyConfig	Allocates provisioned concurrency to reduce cold starts.
lambda:CreateFunctionUrlConfig	Creates an HTTP(S) endpoint for direct function invocation.
lambda:UpdateFunctionUrlConfig	Modifies the function URL configuration.
lambda:TagResource	Adds or updates tags on the function for organization and cost tracking.

"Sid": "LambdaLayerManagement"

Grants permission to publish new layer versions containing model dependencies, shared libraries, or custom runtimes. Layer actions require a layer ARN, scoped separately from functions.

Policy Action	Description
lambda:PublishLayerVersion	Publishes a new version of a layer with updated dependencies or libraries.

"Sid": "PassRoleToLambda"

Allows passing an IAM execution role to Lambda when creating or updating functions. Scoped to tenant-prefixed roles and conditioned to the Lambda service only.

Policy Action	Description
iam:PassRole	Passes an IAM role to Lambda as the function’s execution role.

"Sid": "DenyDeleteAndPermissionChanges"

Explicitly prevents deletion of functions, aliases, layers, concurrency configs, and URL configs. Also blocks changes to resource-based policies (AddPermission/RemovePermission) which control cross-account access. These destructive and permission-escalation actions are reserved for Level 3 (full).

Policy Action Denied	Description
lambda:DeleteFunction	(Denied) Deletes a Lambda function and all its versions.
lambda:DeleteAlias	(Denied) Deletes a function alias.
lambda:DeleteFunctionUrlConfig	(Denied) Removes the function URL endpoint.
lambda:DeleteFunctionConcurrency	(Denied) Removes reserved concurrency from a function.
lambda:DeleteFunctionEventInvokeConfig	(Denied) Removes asynchronous invocation configuration.
lambda:DeleteProvisionedConcurrencyConfig	(Denied) Removes provisioned concurrency allocation.
lambda:DeleteLayerVersion	(Denied) Deletes a published layer version.
lambda:AddPermission	(Denied) Adds a statement to the function’s resource-based policy.
lambda:RemovePermission	(Denied) Removes a statement from the function’s resource-based policy.

Level 3: full¶

Purpose: Complete Lambda inference function lifecycle management

Principal: Human (platform admins, MLOps leads)

Typical Users:

Platform administrators
MLOps team leads
Infrastructure engineers

How It Differs from deploy-manage (Level 2):

Adds delete lifecycle — can remove functions, aliases, layers, versions, configs
Adds resource policy management — AddPermission/RemovePermission for cross-account invocation control
Adds code signing enforcement — manage code signing configurations
Adds event source mapping management — create/update/delete for event-driven inference
Account-wide scope — no function name restrictions (lambda:*)

What You Can Do:

✅ Everything in deploy-manage, PLUS:
✅ Delete functions, aliases, layers, versions, and configs
✅ Manage resource-based policies (cross-account invocation control)
✅ Manage code signing configurations
✅ Create, update, and delete event source mappings
✅ Full account-wide access — not restricted to naming conventions

What You Cannot Do:

❌ Nothing — full Lambda inference management

Example Scenario:

The platform team is decommissioning a retired product line. They need to delete the associated Lambda inference functions, remove their event source mappings, clean up resource-based policies that granted cross-account access, and delete the Lambda layers that were dedicated to those functions.

Sample Permissions:

[
  {
    "Sid": "LambdaFullAccess",
    "Effect": "Allow",
    "Action": "lambda:*",
    "Resource": "*"
  },
  {
    "Sid": "PassRoleToLambda",
    "Effect": "Allow",
    "Action": "iam:PassRole",
    "Resource": "arn:aws:iam::{account_id}:role/{company_prefix}-{env}-*-role-*",
    "Condition": {
      "StringEquals": {
        "iam:PassedToService": "lambda.amazonaws.com"
      }
    }
  }
]

Why lambda:* instead of explicit action enumeration? Lambda’s API surface grows frequently as AWS adds new features. Unlike SageMaker — where new actions can spin up expensive training jobs or endpoints — new Lambda actions have low cost and blast-radius impact. Maintaining an explicit list of 50+ actions creates maintenance debt that leads to stale policies and broken deployments when AWS adds actions. The Resource: * scope is acceptable here because Level 3 principals are platform administrators who need to govern the entire account, including functions that may not follow naming conventions.

"Sid": "LambdaFullAccess"

Grants full Lambda access across all resource types in the account — functions, layers, event source mappings, code signing configs, and any future Lambda resource types. This is the administrative level for platform teams who manage the complete Lambda inference lifecycle.

Policy Action	Description
lambda:*	All Lambda actions — create, read, update, delete, invoke, and manage across all Lambda resource types.

"Sid": "PassRoleToLambda"

Allows passing an IAM execution role to Lambda when creating or updating functions. Even at full Lambda access, PassRole remains scoped to tenant-prefixed roles to prevent privilege escalation via arbitrary role attachment.

Policy Action	Description
iam:PassRole	Passes an IAM role to Lambda as the function’s execution role. Scoped to `{company_prefix}-{env}--role-` and conditioned to the Lambda service only.

TODO: VPC Governance Subsection Document two config-driven VPC condition key patterns that apply to both Level 2 and Level 3:

Enforce specific VPC — Deny UpdateFunctionConfiguration when lambda:VpcIds doesn’t match config value

Deny all VPC — Deny UpdateFunctionConfiguration when lambda:VpcIds is present Config schema: lambda_inference.vpc_policy (“enforce” | “deny” | “none”)

Bedrock Inference¶

Bedrock Inference policies control access to foundation models (FMs) for generative AI workloads. Unlike SageMaker where you manage your own endpoints, Bedrock is a fully managed service — the policy focus is on which models can be invoked, where inference runs (cross-region), and how much throughput is provisioned.

Level 1: invoke-only¶

Purpose: Call foundation models for predictions without managing model access or throughput

Principal: Machine (backend services, chatbots) or Human (developers, analysts)

Typical Users:

Backend services integrating generative AI
Customer-facing chatbots and assistants
Developers prototyping with foundation models
Analysts using text summarization or classification

What You Can Do:

✅ Invoke allowed foundation models
✅ Use the Converse API for chat-based interactions
✅ List available foundation models
✅ View model details and capabilities

What You Cannot Do:

❌ Enable or disable model access
❌ Create or manage provisioned throughput
❌ Configure cross-region inference
❌ Manage custom models or fine-tuning jobs
❌ Create or modify guardrails

Example Scenario:

A customer support chatbot needs to call Claude for generating responses. The service role can invoke the model but cannot change which models are available or provision dedicated throughput.

Sample Permissions:

[
  {
    "Sid": "BedrockDiscovery",
    "Effect": "Allow",
    "Action": [
      "bedrock:ListFoundationModels",
      "bedrock:GetFoundationModel"
    ],
    "Resource": "*"
  },
  {
    "Sid": "BedrockStandardInference",
    "Effect": "Allow",
    "Action": [
      "bedrock:InvokeModel",
      "bedrock:InvokeModelWithResponseStream"
    ],
    "Resource": "*"
  },
  {
    "Sid": "BedrockConverseInference",
    "Effect": "Allow",
    "Action": [
      "bedrock:Converse",
      "bedrock:ConverseStream"
    ],
    "Resource": "*"
  }
]

"Sid": "BedrockDiscovery"

Allows users to discover available foundation models and view their specific capabilities and details.

Policy Action	Description
bedrock:ListFoundationModels	Lists the foundation models available in Amazon Bedrock, which is necessary to identify which models can be invoked.
bedrock:GetFoundationModel	Retrieves detailed information about a specific foundation model, such as input/output modalities and customization support.

"Sid": "BedrockStandardInference"

Enables the core ability to send prompts and receive responses from foundation models, including streaming and chat-specific APIs.

Policy Action	Description
bedrock:InvokeModel	Sends a prompt to a specified model and receives the entire response in a single payload.
bedrock:InvokeModelWithResponseStream	Sends a prompt to a model and receives the response as a series of tokens (streaming), ideal for real-time applications.

"Sid": "BedrockConverseInference"

Enables multi-turn chat interactions via the Converse API. Kept as a separate Sid from BedrockStandardInference for three reasons:

Resource-level control — allows scoping Converse to specific foundation models or inference profiles independently from InvokeModel (e.g., for cost tracking)
Streaming restrictions — if compliance requires disabling streaming (which can bypass certain content inspection or logging), ConverseStream can be split into its own Sid
Auditability — separate Sids make it easier to identify which statement granted a specific permission in IAM policy evaluation

Note: bedrock:Converse and bedrock:ConverseStream are functional Bedrock API actions but were not listed in the AWS IAM Service Authorization Reference at time of writing. If they authorize via InvokeModel under the hood, having them listed explicitly does not affect policy behavior.

Policy Action	Description
bedrock:Converse	Provides a consistent API for multi-turn chat conversations, managing message history and formatting for supported models.
bedrock:ConverseStream	Allows for multi-turn chat conversations with the benefit of streaming responses for lower perceived latency.

Level 2: model-manage¶

Purpose: Manage model access, guardrails, and inference configurations

Principal: Human (ML engineers, AI/ML team leads)

Typical Users:

ML engineers configuring model access for teams
AI/ML team leads managing guardrails and content filters
DevOps engineers setting up cross-region inference
Data scientists managing custom model imports

How It Differs from invoke-only (Level 1):

Adds model access management — enable/disable foundation models for the account
Adds guardrail management — create and configure content filters and safety controls
Adds cross-region configuration — control where inference requests are routed
Still no throughput provisioning or deletion — cost-impacting decisions require Level 3

What You Can Do:

✅ Everything in invoke-only, PLUS:
✅ Enable and disable foundation model access
✅ Create and configure guardrails (content filters, topic blocks)
✅ Manage custom model imports
✅ Configure cross-region inference profiles
✅ View usage metrics and invocation logs

What You Cannot Do:

❌ Create or delete provisioned throughput (cost-impacting)
❌ Delete guardrails
❌ Manage account-level Bedrock settings

Example Scenario:

The AI/ML team lead needs to enable a new Anthropic model for the development team, create a guardrail that blocks PII in model responses, and configure cross-region inference so requests can fail over to us-east-1 if us-west-2 is at capacity.

[
  {
    "Sid": "BedrockStandardInference",
    "Effect": "Allow",
    "Action": [
      "bedrock:InvokeModel",
      "bedrock:InvokeModelWithResponseStream"
    ],
    "Resource": "*"
  },
  {
    "Sid": "BedrockConverseInference",
    "Effect": "Allow",
    "Action": [
      "bedrock:Converse",
      "bedrock:ConverseStream"
    ],
    "Resource": "*"
  },
  {
    "Sid": "ModelAccessManagement",
    "Effect": "Allow",
    "Action": [
      "bedrock:GetFoundationModel",
      "bedrock:ListFoundationModels",
      "bedrock:PutModelInvocationLoggingConfiguration",
      "bedrock:GetModelInvocationLoggingConfiguration",
      "bedrock:ListModelInvocationJobs",
      "bedrock:PutFoundationModelEntitlement",
      "bedrock:PutUseCaseForModelAccess",
      "bedrock:ListFoundationModelAgreementOffers",
      "bedrock:CreateFoundationModelAgreement",
      "bedrock:GetFoundationModelAvailability",
      "bedrock:DeleteFoundationModelAgreement"
    ],
    "Resource": "*"
  },
  {
    "Sid": "GuardrailManagement",
    "Effect": "Allow",
    "Action": [
      "bedrock:CreateGuardrail",
      "bedrock:UpdateGuardrail",
      "bedrock:CreateGuardrailVersion",
      "bedrock:GetGuardrail",
      "bedrock:ListGuardrails"
    ],
    "Resource": "*"
  },
  {
    "Sid": "CustomModelImportManagement",
    "Effect": "Allow",
    "Action": [
      "bedrock:ImportModel",
      "bedrock:GetCustomModel",
      "bedrock:ListCustomModels",
      "bedrock:CreateModelImportJob",
      "bedrock:GetModelImportJob",
      "bedrock:ListModelImportJobs",
      "bedrock:StopModelImportJob"
    ],
    "Resource": "*"
  },
  {
    "Sid": "CrossRegionInferenceManagement",
    "Effect": "Allow",
    "Action": [
      "bedrock:CreateInferenceProfile",
      "bedrock:GetInferenceProfile",
      "bedrock:ListInferenceProfiles",
      "bedrock:UpdateInferenceProfile"
    ],
    "Resource": "*"
  },
  {
    "Sid": "ObservabilityAndMetrics",
    "Effect": "Allow",
    "Action": [
      "cloudwatch:GetMetricData",
      "cloudwatch:ListMetrics",
      "logs:DescribeLogGroups",
      "logs:GetLogEvents"
    ],
    "Resource": "*"
  },
  {
      "Sid": "DenyBedrockDeleteOperations",
      "Effect": "Deny",
      "Action": [
          "bedrock:DeleteCustomModel",
          "bedrock:DeleteModelInvocationLoggingConfiguration",
          "bedrock:DeleteProvisionedModelThroughput",
          "bedrock:DeleteModelImportJob",
          "bedrock:DeleteCustomModelDeployment",
          "bedrock:DeleteInferenceProfile",
          "bedrock:DeletePromptRouter",
          "bedrock:DeleteGuardrail",
          "bedrock:DeleteKnowledgeBase",
          "bedrock:DeleteAgent"
      ],
      "Resource": "*"
  }
]

"Sid": "BedrockStandardInference"

Enables the core ability to send prompts and receive responses from foundation models, including streaming and chat-specific APIs.

Policy Action	Description
bedrock:InvokeModel	Sends a prompt to a specified model and receives the entire response in a single payload.
bedrock:InvokeModelWithResponseStream	Sends a prompt to a model and receives the response as a series of tokens (streaming), ideal for real-time applications.

"Sid": "BedrockConverseInference"

Multi-turn chat interactions via the Converse API. Kept as a separate Sid for independent resource scoping, streaming control, and auditability (see Level 1 rationale).

Policy Action	Description
bedrock:Converse	Provides a consistent API for multi-turn chat conversations, managing message history and formatting for supported models.
bedrock:ConverseStream	Allows for multi-turn chat conversations with the benefit of streaming responses for lower perceived latency.

"Sid": "ModelAccessManagement"

Permissions to enable, disable, and manage entitlements for foundation models within the account.

Policy Action	Description
bedrock:GetFoundationModel	Retrieves detailed information and properties about a specific Amazon Bedrock foundation model.
bedrock:ListFoundationModels	Lists all foundation models available in Amazon Bedrock for the current region.
bedrock:PutModelInvocationLoggingConfiguration	Configures where to store model invocation logs, such as S3 buckets or CloudWatch Logs.
bedrock:GetModelInvocationLoggingConfiguration	Retrieves the current configuration for model invocation logging.
bedrock:ListModelInvocationJobs	Lists asynchronous model invocation jobs to track batch processing status.
bedrock:PutFoundationModelEntitlement	Submits a request for foundation model entitlement. Largely automated for most models but still required for certain provider-specific access flows.
bedrock:PutUseCaseForModelAccess	Submits the required provider use-case form for first-time model access, such as the Anthropic use-case disclosure. One-time per account or organization.
bedrock:ListFoundationModelAgreementOffers	Grants permission to view available agreement offers for foundation models.
bedrock:CreateFoundationModelAgreement	Grants permission to officially accept an offer and create a new agreement for a foundation model.
bedrock:GetFoundationModelAvailability	Grants permission to check if a specific foundation model is available for use in your account or region.
bedrock:DeleteFoundationModelAgreement	Grants permission to terminate or delete an existing foundation model agreement.

"Sid": "GuardrailManagement"

Permissions to create and configure safety controls, content filters, and PII masking without deletion rights.

Policy Action	Description
bedrock:CreateGuardrail	Creates a new guardrail to filter sensitive content or block specific topics in model responses.
bedrock:UpdateGuardrail	Modifies existing guardrail configurations, such as updating filter strengths or blocked words.
bedrock:CreateGuardrailVersion	Creates a snapshot version of a guardrail for consistent deployment across environments.
bedrock:GetGuardrail	Retrieves the detailed configuration of a specific guardrail.
bedrock:ListGuardrails	Lists all guardrails defined in the account.

"Sid": "CustomModelImportManagement"

Manages custom model imports and cross-region routing profiles for high availability.

Policy Action	Description
bedrock:ImportModel	Initiates the process of importing a custom model into Amazon Bedrock.
bedrock:GetCustomModel	Retrieves details about a custom or imported model.
bedrock:ListCustomModels	Lists all custom models available in the account.
bedrock:CreateModelImportJob	Starts the process of importing a custom model into Bedrock.
bedrock:GetModelImportJob	Retrieves detailed information and the current status of a specific import job.
bedrock:ListModelImportJobs	Returns a list of all model import jobs submitted.
bedrock:StopModelImportJob	Immediately cancels a model import job that is currently in progress.

"Sid": "CrossRegionInferenceManagement"

Enables managing and tracking model usage across one or multiple AWS regions.

Policy Action	Description
bedrock:CreateInferenceProfile	Sets up cross-region inference profiles to manage request routing and failover.
bedrock:GetInferenceProfile	Retrieves details about a specific inference profile.
bedrock:ListInferenceProfiles	Lists available inference profiles for the account.
bedrock:UpdateInferenceProfile	Grants permission to modify the settings of an existing application inference profile, such as updating its description or configuration.

"Sid": "ObservabilityAndMetrics"

Grants permissions to access CloudWatch metrics and logs related to Bedrock model invocations for monitoring and troubleshooting.

Policy Action	Description
cloudwatch:GetMetricData	Grants permission to retrieve batch amounts of CloudWatch metric data and perform metric math on the retrieved data.
cloudwatch:ListMetrics	Grants permission to retrieve a list of valid metrics stored for the AWS account owner, which can then be used to get statistical data.
logs:DescribeLogGroups	Grants permission to return all log groups associated with the requesting AWS account, including data sources that ingest into them.
logs:GetLogEvents	Grants permission to retrieve individual log events from a specific log stream, with the ability to filter results by time range.

"Sid": "DenyBedrockDeleteOperations"

Denies delete operations within Amazon Bedrock.

Policy Action	Description
bedrock:DeleteCustomModel	Deletes a custom model that was previously created through model customization (fine-tuning).
bedrock:DeleteModelInvocationLoggingConfiguration	Removes the configuration that logs model inputs and outputs to S3 or CloudWatch, which is often used for auditing.
bedrock:DeleteProvisionedModelThroughput	Deletes a Provisioned Throughput reservation; note that this typically cannot be done before a commitment term ends.
bedrock:DeleteModelImportJob	Deletes a record or job associated with importing a customized model from other environments like Amazon SageMaker.
bedrock:DeleteCustomModelDeployment	Stops and removes a deployed custom model, making its ARN unavailable for further inference.
bedrock:DeleteInferenceProfile	Deletes an inference profile, which is used to manage and track model invocation across different regions or configurations.
bedrock:DeletePromptRouter	Removes a prompt router used to direct incoming requests to specific models or versions.
bedrock:DeleteGuardrail	Deletes a Bedrock Guardrail, which provides content filtering and safety controls for generative AI applications.
bedrock:DeleteKnowledgeBase	Deletes a Knowledge Base resource used for Retrieval-Augmented Generation (RAG) workflows.
bedrock:DeleteAgent	Deletes an Amazon Bedrock Agent that automates tasks by interacting with foundation models and other AWS services.

Level 3: full¶

Purpose: Complete Bedrock platform management including cost-impacting operations

Principal: Human (platform admins, cloud architects)

Typical Users:

Platform administrators
Cloud architects
FinOps engineers (provisioned throughput decisions)
Security team (account-level controls)

How It Differs from model-manage (Level 2):

Adds provisioned throughput — create, modify, and delete dedicated model capacity (significant cost)
Adds delete operations — can remove guardrails, custom models, inference profiles, agents, knowledge bases
Adds account-level settings — manage Bedrock service-level configurations
Full fine-tuning control — create and manage model customization jobs
Full platform governance — agents, knowledge bases, evaluations, prompt routers, batch inference

What You Can Do:

✅ Everything in model-manage, PLUS:
✅ Create and delete provisioned throughput (dedicated capacity)
✅ Delete guardrails, custom models, inference profiles, agents, knowledge bases
✅ Manage model fine-tuning and customization jobs
✅ Configure account-level Bedrock settings
✅ Manage agents, knowledge bases, evaluations, and prompt routers

What You Cannot Do:

❌ Nothing — full Bedrock management

Example Scenario:

The platform team needs to provision dedicated throughput for the production chatbot ahead of a product launch, clean up unused guardrails from a decommissioned project, and configure account-level logging for all Bedrock invocations.

Sample Permissions:

[
  {
    "Sid": "BedrockFullAccess",
    "Effect": "Allow",
    "Action": "bedrock:*",
    "Resource": "*"
  },
  {
    "Sid": "BedrockFullObservability",
    "Effect": "Allow",
    "Action": [
      "cloudwatch:GetMetricData",
      "cloudwatch:ListMetrics",
      "logs:DescribeLogGroups",
      "logs:GetLogEvents"
    ],
    "Resource": "*"
  },
  {
    "Sid": "PassRoleToBedrock",
    "Effect": "Allow",
    "Action": "iam:PassRole",
    "Resource": "arn:aws:iam::{account_id}:role/{company_prefix}-{env}-*-role-*",
    "Condition": {
      "StringEquals": {
        "iam:PassedToService": "bedrock.amazonaws.com"
      }
    }
  }
]

Why bedrock:* instead of explicit action enumeration? Bedrock’s API surface is growing rapidly as AWS adds agents, knowledge bases, evaluations, prompt routers, batch inference, marketplace models, and more. Unlike SageMaker — where new actions can spin up expensive training jobs or endpoints — Bedrock’s cost-impacting actions are limited to provisioned throughput, which admins explicitly manage. Maintaining an explicit list of 80+ actions creates maintenance debt that leads to stale policies and broken deployments when AWS adds actions. The Resource: * scope is acceptable here because Level 3 principals are platform administrators who need to govern the entire account.

"Sid": "BedrockFullAccess"

Grants full Bedrock access across all resource types in the account — foundation models, custom models, guardrails, inference profiles, provisioned throughput, agents, knowledge bases, evaluations, and any future Bedrock resource types. This is the administrative level for platform teams who manage the complete Bedrock lifecycle.

Policy Action	Description
bedrock:*	All Bedrock actions — invoke, model access, guardrails, provisioned throughput, custom models, agents, knowledge bases, evaluations, prompt routers, and all future actions.

"Sid": "BedrockFullObservability"

Grants permissions to access CloudWatch metrics and logs related to Bedrock model invocations for monitoring and troubleshooting. These are non-Bedrock namespace actions that bedrock:* does not cover.

Policy Action	Description
cloudwatch:GetMetricData	Grants permission to retrieve batch amounts of CloudWatch metric data and perform metric math on the retrieved data.
cloudwatch:ListMetrics	Grants permission to retrieve a list of valid metrics stored for the AWS account owner, which can then be used to get statistical data.
logs:DescribeLogGroups	Grants permission to return all log groups associated with the requesting AWS account, including data sources that ingest into them.
logs:GetLogEvents	Grants permission to retrieve individual log events from a specific log stream, with the ability to filter results by time range.

"Sid": "PassRoleToBedrock"

Allows passing an IAM service role to Bedrock for operations that require it, such as model customization (fine-tuning) jobs and model import jobs. Even at full Bedrock access, PassRole remains scoped to tenant-prefixed roles to prevent privilege escalation via arbitrary role attachment.

Policy Action	Description
iam:PassRole	Passes an IAM role to Bedrock as the service role for customization and import jobs. Scoped to `{company_prefix}-{env}--role-` and conditioned to the Bedrock service only.

TODO: Config-Driven Bedrock Model Scoping

Add an inference section to client configs that controls Resource ARN generation for Bedrock invoke actions.

Config schema:
inference:
  bedrock:
    allowed_models:          # list of foundation model ID patterns
      - "anthropic.claude-3-sonnet-*"
      - "anthropic.claude-3-haiku-*"
      - "amazon.titan-embed-text-v1"
    allowed_regions:         # for cross-region inference profiles
      - "us-west-2"
      - "us-east-1"
Template generator behavior:

allowed_models present → Resource becomes list of arn:aws:bedrock:{region}::foundation-model/<model-id> ARNs

allowed_models absent or ["*"] → Resource stays "*"

Applies to Level 1 and Level 2 invoke Sids; Level 3 (full) always uses "*"

Per-tier defaults:

Startup: omit or ["*"] (no restriction, encourage exploration)

Medium: explicit model list (cost control)

Enterprise: explicit model list (compliance requirement)

Code changes needed:

Add inference key to validation schemas (validation-schema-startup.yaml, medium, enterprise)

Update template generator to read inference.bedrock.allowed_models and build Resource ARN list

Update client configs with inference section (all 3 tiers)

Add unit tests for ARN generation with and without allowed_models

Also include in inference section (for consistency with existing designs):
inference:
  sagemaker:
    endpoint_prefix: "{company_prefix}-{env}"
  lambda:
    vpc_policy: "none"       # "enforce" | "deny" | "none"
  bedrock:
    allowed_models: ["*"]
    allowed_regions: ["us-west-2"]

KMS Policies¶

KMS (Key Management Service) policies control read-only access to encryption keys used across the MLOps platform. KMS keys protect S3 objects, SageMaker model artifacts, and other sensitive data.

Level 1: read-only¶

Purpose: Verify encryption settings and key configurations without the ability to encrypt, decrypt, or modify keys.

Typical Users:

Operations support (verify encryption compliance)
Security auditors (review key policies and rotation status)
Compliance reviewers (confirm encryption standards)

What You Can Do:

✅ View key metadata, policies, and rotation status
✅ List all KMS keys and aliases in the account
✅ View resource tags on keys
✅ Retrieve public keys (for asymmetric keys)

What You Cannot Do:

❌ No Encrypt/Decrypt: Cannot use keys to encrypt or decrypt data
❌ No Key Management: Cannot create, disable, delete, or schedule deletion of keys
❌ No Policy Changes: Cannot modify key policies or grants
❌ No Key Rotation Changes: Cannot enable or disable automatic key rotation

Sample Permissions:

[
  {
    "Sid": "ReadOnlyAccessForAllKMSKeysInAccount",
    "Effect": "Allow",
    "Action": [
      "kms:GetPublicKey",
      "kms:GetKeyRotationStatus",
      "kms:GetKeyPolicy",
      "kms:DescribeKey",
      "kms:ListKeyPolicies",
      "kms:ListResourceTags",
      "tag:GetResources"
    ],
    "Resource": "arn:aws:kms:*:{account_id}:key/*"
  },
  {
    "Sid": "ReadOnlyAccessForOperationsWithNoKMSKey",
    "Effect": "Allow",
    "Action": [
      "kms:ListKeys",
      "kms:ListAliases"
    ],
    "Resource": "*"
  }
]

"Sid": "ReadOnlyAccessForAllKMSKeysInAccount"

Grants read-only access to individual KMS key metadata, scoped to the account.

Policy Action	Description
kms:GetPublicKey	Retrieve the public key of an asymmetric KMS key.
kms:GetKeyRotationStatus	Check whether automatic key rotation is enabled for a key.
kms:GetKeyPolicy	View the resource-based policy attached to a KMS key.
kms:DescribeKey	Retrieve metadata about a KMS key (creation date, state, key spec).
kms:ListKeyPolicies	List the names of key policies attached to a key.
kms:ListResourceTags	View tags associated with a KMS key.
tag:GetResources	Query resources by tag across services (supports KMS key discovery by tag).

"Sid": "ReadOnlyAccessForOperationsWithNoKMSKey"

Grants account-wide discovery actions that don’t target a specific key.

Policy Action	Description
kms:ListKeys	List all KMS key IDs in the account.
kms:ListAliases	List all key aliases for easy identification of keys by name.

Note: This policy replaces the non-existent AWSKeyManagementServiceReadOnlyAccess AWS managed policy. AWS does not provide a managed KMS read-only policy, so this is implemented as a custom policy template.

Trusted Advisor Policies¶

Trusted Advisor policies control read-only access to AWS Trusted Advisor checks and recommendations. Trusted Advisor evaluates your account against best practices for cost optimization, performance, security, fault tolerance, and service limits.

Level 1: read-only¶

Purpose: View Trusted Advisor check results and recommendations without the ability to refresh checks or modify preferences.

Typical Users:

FinOps managers (review cost optimization and performance recommendations)

What You Can Do:

✅ View all Trusted Advisor check details and summaries
✅ View flagged resources for each check
✅ View account Support plan and Trusted Advisor preferences

What You Cannot Do:

❌ No Refresh: Cannot refresh Trusted Advisor checks
❌ No Modifications: Cannot modify Trusted Advisor preferences or notification settings
❌ No Priority Access: Does not include Trusted Advisor Priority features (separate policy)

Sample Permissions:

[
  {
    "Sid": "TrustedAdvisorReadOnlyAccess",
    "Effect": "Allow",
    "Action": [
      "trustedadvisor:DescribeChecks",
      "trustedadvisor:DescribeCheckSummaries",
      "trustedadvisor:DescribeCheckItems",
      "trustedadvisor:DescribeAccount"
    ],
    "Resource": "*"
  }
]

"Sid": "TrustedAdvisorReadOnlyAccess"

Grants read-only access to Trusted Advisor checks and account information.

Policy Action	Description
trustedadvisor:DescribeChecks	View details for all Trusted Advisor checks.
trustedadvisor:DescribeCheckSummaries	View summaries of check results.
trustedadvisor:DescribeCheckItems	View specific details for flagged resources.
trustedadvisor:DescribeAccount	View Support plan and Trusted Advisor preferences.

Note: This policy replaces the non-existent AWSTrustedAdvisorReadOnlyAccess AWS managed policy. AWS does not provide a managed Trusted Advisor read-only policy. The closest alternatives are AWSTrustedAdvisorPriorityReadOnlyAccess (scoped to Priority features only) and AWSSupportAccess (broader, includes check refresh). This custom template provides least-privilege read-only access.

Combined Policies¶

Combined policies merge multiple service-level read-only policies into a single managed policy. This is required when a group needs read-only access across many services and would otherwise exceed the AWS hard limit of 10 managed policies per group.

Why Combined Policies Exist¶

AWS IAM enforces a hard cap of 10 managed policies per group (cannot be increased). Groups like operations_support need:

Multiple AWS managed read-only policies (CloudWatch, X-Ray, KMS, etc.)
Multiple customer managed service-level policies (S3, ECR, SageMaker, Lambda, Bedrock)

When the total exceeds 10, we consolidate the customer managed service-level policies into a single combined policy.

Key Constraints¶

Constraint	Limit
Managed policies per group	10 (hard cap, no increase)
Managed policy size	6,144 characters (JSON)
Customer managed policies per account	1,500 (can increase to 5,000)

Design Rules¶

Individual service-level templates still exist — they are reused by other groups that don’t hit the limit
Combined policies are standalone templates — the generator treats them like any other policy, no special merge logic
Combined policies are group-specific — named for the group they serve (e.g., ops-services-read-only)
Only combine when forced — if a group is within the 10-policy limit, use individual service-level policies

ops-services-read-only¶

Purpose: Consolidates 5 service-level read-only policies into a single managed policy for the operations_support group.

Replaces:

s3: level1 (read-only)
ecr: level1 (read-only)
sagemaker: level1 (read-only)
lambda: level1 (invoke-only)
bedrock: level1 (invoke-only)

Policy Budget (operations_support):

Before	Count	After	Count
AWS managed policies	7	AWS managed policies	7
Customer managed (5 individual)	5	Customer managed (1 combined)	1
Total	12 🚨	Total	8 ✅

Size Check: ~3,856 characters (JSON) — well within the 6,144 character limit.

Sids (14 total):

Service	Sids	Source Template
S3	AllowListAllBuckets, AllowReadAndVersionAccess	s3/level1-read-only
ECR	AllowECRAuth, AllowReadOnlyPullAndMetadata	ecr/level1-read-only
SageMaker	SageMakerEndpointReadOnly, CloudWatchMetricsReadOnly, AutoScalingReadOnly, ExplicitDenyInference	sagemaker/level1-read-only
Lambda	LambdaDiscoveryListActions, LambdaDiscoveryActions, LambdaInvocationActions	lambda/level1-invoke-only
Bedrock	BedrockDiscovery, BedrockStandardInference, BedrockConverseInference	bedrock/level1-invoke-only

Template Location: policies/templates/combined/ops-services-read-only.yaml

Config Usage:

operations_support:
  managed_policies:
    - CloudWatchReadOnlyAccess
    - CloudWatchLogsReadOnlyAccess
    - AWSXrayReadOnlyAccess
    - AWSKeyManagementServiceReadOnlyAccess
    - ServiceQuotasReadOnlyAccess
    - IAMReadOnlyAccess
    - AmazonSNSReadOnlyAccess
  policy_assignments:
    combined: ops-services-read-only

Maintenance Note: If any of the 5 source service-level templates change (e.g., a new action added to S3 level1), the combined policy must be updated manually to stay in sync. This is an accepted trade-off — operations_support read-only policies change infrequently.

mlops-services-a / b / c¶

Purpose: Consolidates 6 service-level deployment policies into 3 combined policies for the mlops_engineers group. Split into 3 because the 6 services combined exceed the 6,144 character managed policy size limit.

Split:

Policy	Services	Chars	Sids
`mlops-services-a`	S3 level2, ECR level3, Pipeline level3, SageMaker level3	~5,069	15
`mlops-services-b`	Lambda level2	~3,455	8
`mlops-services-c`	Bedrock level2	~2,896	8

Replaces:

s3: level2 (project-buckets-only)
ecr: level3 (ci-read-write)
pipeline: level3 (project-ci)
sagemaker: level3 (prod-invoke)
lambda: level2 (deploy-manage)
bedrock: level2 (model-manage)

Policy Budget (mlops_engineers):

Before	Count	After	Count
AWS managed policies	4	AWS managed policies	4
Customer managed (6 individual)	6	Customer managed (3 combined)	3
Total	10 ⚠️	Total	7 ✅

Template Locations:

policies/templates/combined/mlops-services-a.yaml
policies/templates/combined/mlops-services-b.yaml
policies/templates/combined/mlops-services-c.yaml

Config Usage:

mlops_engineers:
  managed_policies:
    - AmazonECS_FullAccess
    - AWSCodeDeployFullAccess
    - AWSServiceCatalogEndUserFullAccess
    - CloudWatchLogsReadOnlyAccess
  policy_assignments:
    combined_a: mlops-services-a
    combined_b: mlops-services-b
    combined_c: mlops-services-c

Why 3 policies instead of 1? The 6 services combined produce ~11,320 characters of JSON — nearly double the 6,144 character managed policy size limit. Lambda level2 (3,455 chars) and Bedrock level2 (2,896 chars) are each too large to combine with the other 4 services, so each gets its own policy.

Maintenance Note: If any of the 6 source service-level templates change, the corresponding combined policy must be updated manually. Review when any source template is modified.

Assignment Recommendations¶

Typical Team Structure¶

Storage & Pipeline¶

Role	S3	ECR	Pipeline
Junior Data Scientist	read-only	read-only	read-only
Data Scientist	project-buckets-only	read-only	read-only
Senior Data Scientist	project-buckets-full	read-only	read-only
ML Engineer	project-buckets-full	dev-read-write	project-dev
MLOps Engineer	full	full	full
Backend Developer	-	read-only	-
Auditor / Compliance	read-only	read-only	read-only
Model Risk Manager	read-only	read-only	read-only
Executive / Stakeholder	-	-	read-only
CI/CD Pipeline (Role)	project-buckets-only	ci-read-write	project-ci
Platform Admin	full	full	full

Inference¶

Role	SageMaker	Lambda	Bedrock
Junior Data Scientist	read-only	-	invoke-only
Data Scientist	dev-invoke	-	invoke-only
Senior Data Scientist	dev-invoke	-	invoke-only
ML Engineer	dev-invoke	deploy-manage	model-manage
MLOps Engineer	full	full	full
Backend Developer	prod-invoke	-	invoke-only
Auditor / Compliance	read-only	-	invoke-only
Model Risk Manager	read-only	-	invoke-only
Executive / Stakeholder	-	-	invoke-only
CI/CD Pipeline (Role)	deploy-only	deploy-manage	invoke-only
Platform Admin	full	full	full

Assignment Best Practices¶

Start Minimal - Begin with read-only, expand based on actual needs
Time-Bound Elevation - Grant temporary full access for specific tasks, then revoke
Project Isolation - Use project-only levels to prevent cross-team interference
Separate Humans from Automation - Use dev-read-write for users, ci-read-write for roles
Regular Reviews - Audit access quarterly, remove unused permissions

Troubleshooting¶

Common AccessDenied Scenarios¶

“Access Denied when uploading to S3”¶

Error:

An error occurred (AccessDenied) when calling the PutObject operation

Cause: You have read-only access

Solution: Request project-buckets-only or higher

“Access Denied when deleting S3 objects”¶

Error:

An error occurred (AccessDenied) when calling the DeleteObject operation

Cause: You have project-buckets-only (no delete permission)

Solution: Request project-buckets-full access

“Access Denied when pushing to ECR”¶

Error:

denied: User: arn:aws:iam::123456789012:user/john is not authorized to perform: ecr:PutImage

Cause: You have read-only ECR access

Solution: Request dev-read-write access (if you’re a human) or ci-read-write (if you’re a CI/CD pipeline)

“Access Denied when invoking SageMaker endpoint”¶

Error:

An error occurred (AccessDeniedException) when calling the InvokeEndpoint operation

Cause: You don’t have inference policy, or endpoint is in a different environment

Solution:

For production endpoints: Request prod-invoke access
For dev/staging endpoints: Request dev-invoke access

“Cannot authenticate to ECR”¶

Error:

Error response from daemon: Get https://123456789012.dkr.ecr.us-east-1.amazonaws.com/v2/: no basic auth credentials

Cause: Missing GetAuthorizationToken permission or expired token

Solution:

Verify you have any ECR policy level (all include GetAuthorizationToken)
Re-run: aws ecr get-login-password | docker login ...
Check AWS credentials are valid: aws sts get-caller-identity

“Access Denied when starting or stopping a pipeline”¶

Error:

An error occurred (AccessDeniedException) when calling the StartPipelineExecution operation

Cause: You have read-only Pipeline access

Solution: Request project-dev (for development pipelines) or project-ci (for CI/CD roles)

“Access Denied when invoking a Lambda function”¶

Error:

An error occurred (AccessDeniedException) when calling the Invoke operation

Cause: You don’t have Lambda inference access, or the function name doesn’t match your policy’s resource scope

Solution: Request Lambda deploy-manage access. Verify the function follows the {company_prefix}-{env}-* naming convention.

“Access Denied when deleting a Lambda function”¶

Error:

An error occurred (AccessDeniedException) when calling the DeleteFunction operation

Cause: You have deploy-manage (Level 2) which explicitly denies delete operations

Solution: Delete operations require Lambda full (Level 3). Contact your platform administrator.

“Access Denied when invoking a Bedrock foundation model”¶

Error:

An error occurred (AccessDeniedException) when calling the InvokeModel operation

Cause: You don’t have Bedrock inference access, or the model hasn’t been enabled for the account

Solution:

Verify you have at least invoke-only access
Check that the model is enabled: someone with model-manage access must accept the model agreement first

“Access Denied when creating or deleting a Bedrock guardrail”¶

Error:

An error occurred (AccessDeniedException) when calling the CreateGuardrail operation

Cause: You have invoke-only (Level 1) which doesn’t include guardrail management

Solution:

To create/update guardrails: Request model-manage access
To delete guardrails: Request full access (Level 3) — Level 2 explicitly denies delete operations

“Access Denied when creating provisioned throughput in Bedrock”¶

Error:

An error occurred (AccessDeniedException) when calling the CreateProvisionedModelThroughput operation

Cause: Provisioned throughput is a cost-impacting operation reserved for Level 3

Solution: Request Bedrock full access. This is typically restricted to platform admins and FinOps engineers.

“Access Denied when passing a role (PassRole)”¶

Error:

An error occurred (AccessDenied) when calling the CreateFunction operation: User is not authorized to perform: iam:PassRole

Cause: Either your policy doesn’t include PassRole, or the role ARN doesn’t match the {company_prefix}-{env}-*-role-* pattern

Solution:

Verify the role follows the naming convention: {company_prefix}-{env}-*-role-*
Verify the PassRole condition matches the target service (e.g., lambda.amazonaws.com, bedrock.amazonaws.com)
If the role name is correct, request the appropriate access level that includes PassRole

“Action explicitly denied despite having Allow permissions”¶

Error:

An error occurred (AccessDeniedException) when calling the DeleteGuardrail operation: User is not authorized to perform: bedrock:DeleteGuardrail with an explicit deny

Cause: Your policy level includes an explicit Deny statement that overrides any Allow. Lambda deploy-manage (Level 2) and Bedrock model-manage (Level 2) both include Deny blocks for destructive actions.

Solution: Explicit Deny cannot be overridden by Allow — this is by design. You need the full (Level 3) policy which removes the Deny block. Contact your platform administrator.

Security Best Practices¶

1. Principle of Least Privilege¶

Do:

✅ Start with read-only access
✅ Grant write access only when needed
✅ Use project-only scopes when possible
✅ Limit production access to specific roles

Don’t:

❌ Give everyone full access “just in case”
❌ Use all-environments when production-only suffices
❌ Grant delete permissions without justification

2. Separation of Duties¶

Do:

✅ Assign dev-read-write to IAM users (humans)
✅ Assign ci-read-write to IAM roles (automation)
✅ Keep development and production access separate
✅ Require different people for deployment approval

Don’t:

❌ Use the same credentials for humans and CI/CD
❌ Give developers direct production write access
❌ Allow automated systems to have full admin rights

3. Audit and Monitoring¶

Do:

✅ Enable CloudTrail logging (always on)
✅ Review access logs quarterly
✅ Set up alerts for sensitive actions (DeleteBucket, DeleteRepository)
✅ Monitor for unusual access patterns

Don’t:

❌ Ignore CloudTrail logs
❌ Share IAM credentials between team members
❌ Disable logging to “improve performance”

4. Credential Management¶

Do:

✅ Use IAM roles for EC2/ECS/Lambda (no hardcoded keys)
✅ Rotate access keys every 90 days
✅ Use temporary credentials (STS AssumeRole) when possible
✅ Store secrets in AWS Secrets Manager, not code

Don’t:

❌ Hardcode AWS credentials in code or Docker images
❌ Commit credentials to Git repositories
❌ Share access keys via email or Slack
❌ Use root account credentials for daily work

5. Environment Isolation¶

Do:

✅ Use separate AWS accounts for dev/staging/prod (ideal)
✅ Use resource naming conventions (acme-mlops-dev-, acme-mlops-prod-)
✅ Restrict production access to specific IAM principals
✅ Require MFA for production access

Don’t:

❌ Mix dev and prod resources in the same bucket/repository
❌ Allow dev pipelines to access prod endpoints
❌ Use the same IAM role across all environments

6. Explicit Deny for Destructive Actions¶

Do:

✅ Use Deny blocks at intermediate levels (Level 2) to prevent accidental deletion
✅ Reserve delete operations for full (Level 3) principals only
✅ Include all service-specific delete actions in the Deny block (not just the obvious ones)
✅ Document which Deny block is active at each level so users understand why Allow doesn’t work

Don’t:

❌ Rely on “absence of Allow” as a safety mechanism — explicit Deny is stronger
❌ Add Deny blocks at Level 3 (full) — defeats the purpose of full access
❌ Forget that explicit Deny overrides any Allow, even from other attached policies

7. AI/ML Service Governance¶

Do:

✅ Scope Bedrock model access using config-driven allowed_models lists per tier
✅ Enforce guardrails on all production inference workloads before launch
✅ Restrict provisioned throughput creation to FinOps-approved principals (Level 3 only)
✅ Scope PassRole to tenant-prefixed roles ({company_prefix}-{env}-*-role-*) with service conditions
✅ Use separate policy levels for model invocation vs model management

Don’t:

❌ Grant bedrock:* to non-admin roles — provisioned throughput can incur significant cost
❌ Allow unrestricted PassRole — this is the most common privilege escalation vector
❌ Skip guardrail configuration for production Bedrock workloads
❌ Let automation roles manage model access agreements — keep that as a human decision

Getting Help¶

Request Access Changes¶

Contact your MLOps platform administrator with:

Current access level - What you have now
Requested access level - What you need
Justification - Why you need it (specific use case)
Duration - Permanent or temporary (e.g., 2 weeks for project)

Report Security Issues¶

If you discover:

Overly permissive policies
Credentials in code or logs
Unauthorized access attempts
Compliance violations

Contact: security@your-company.com (replace with your security team contact)

Appendix: Policy Type Summary¶

S3 Policies¶

read-only - Safe exploration, no modifications
project-buckets-only - Standard work, no deletion
project-buckets-full - Senior users, cleanup capability
full - Platform admins only

ECR Policies¶

read-only - Pull images for local testing
dev-read-write - Humans pushing images manually
ci-read-write - Automation pushing images
full - Repository management

Pipeline Policies¶

read-only - View pipelines, logs, history (governance/audit)
project-dev - Humans creating/running pipelines (IAM users)
project-ci - Automation creating/running pipelines (IAM roles)
full - Platform-wide management

Inference Policies¶

SageMaker¶

read-only - View endpoint status/config (no invoke, no cost)
dev-invoke - Invoke dev/staging endpoints for testing
prod-invoke - Invoke production endpoints only
full - Complete endpoint lifecycle management

Lambda¶

read-only - View function config/status (no invoke)
deploy-manage - Deploy, update, and invoke functions (no delete)
full - Complete function lifecycle management

Bedrock¶

invoke-only - Call foundation models and list available models
model-manage - Manage model access, guardrails, imports, cross-region inference (no delete, no throughput)
full - Complete Bedrock platform management including provisioned throughput

Document Version: 1.0
Last Updated: 2024
Maintained By: MLOps Platform Team