Integration Examples

Examples showing how to use S3 infrastructure provisioned by the S3 Provisioner with common ML tools and workflows.

Table of Contents

SageMaker Integration

Training Job

import sagemaker
from sagemaker.estimator import Estimator

# Use provisioned S3 bucket and folder structure
bucket_name = "edge-prod-b001-us-west-1-s3"
solution = "customer-churn"

estimator = Estimator(
    image_uri="123456789012.dkr.ecr.us-west-1.amazonaws.com/ml-training:latest",
    role="arn:aws:iam::123456789012:role/edge-prod-b001-role-sagemaker-execution",
    instance_count=1,
    instance_type="ml.m5.xlarge",
    output_path=f"s3://{bucket_name}/solutions/{solution}/models/training/",
)

estimator.fit({
    "train": f"s3://{bucket_name}/solutions/{solution}/data/processed/train/",
    "validation": f"s3://{bucket_name}/solutions/{solution}/data/processed/validation/"
})

Model Registry

from sagemaker.model import Model

model = Model(
    image_uri="123456789012.dkr.ecr.us-west-1.amazonaws.com/ml-inference:latest",
    model_data=f"s3://{bucket_name}/solutions/{solution}/models/training/output/model.tar.gz",
    role="arn:aws:iam::123456789012:role/edge-prod-b001-role-sagemaker-execution",
)

model.register(
    model_package_group_name=f"{solution}-models",
    content_types=["application/json"],
    response_types=["application/json"],
    inference_instances=["ml.m5.large"],
)

Batch Transform

transformer = estimator.transformer(
    instance_count=1,
    instance_type="ml.m5.xlarge",
    output_path=f"s3://{bucket_name}/solutions/{solution}/data/inference/batch/"
)

transformer.transform(
    data=f"s3://{bucket_name}/solutions/{solution}/data/inference/input/",
    content_type="text/csv"
)

Lambda Integration

Inference Function

import boto3
import json

s3 = boto3.client('s3')

BUCKET = "edge-prod-b001-us-west-1-s3"
SOLUTION = "customer-churn"

def lambda_handler(event, context):
    # Read input from S3
    input_key = f"solutions/{SOLUTION}/data/inference/input/{event['filename']}"
    response = s3.get_object(Bucket=BUCKET, Key=input_key)
    input_data = json.loads(response['Body'].read())

    # Run inference (call SageMaker endpoint)
    runtime = boto3.client('sagemaker-runtime')
    result = runtime.invoke_endpoint(
        EndpointName=f"edge-prod-b001-{SOLUTION}-endpoint",
        ContentType="application/json",
        Body=json.dumps(input_data)
    )

    # Write results to S3
    output_key = f"solutions/{SOLUTION}/data/inference/output/{event['filename']}"
    s3.put_object(
        Bucket=BUCKET,
        Key=output_key,
        Body=result['Body'].read()
    )

    return {"status": "success", "output_key": output_key}

Glue ETL Integration

Data Processing Job

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext

args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)

BUCKET = "edge-prod-b001-us-west-1-s3"
SOLUTION = "customer-churn"

# Read raw data
raw_data = glueContext.create_dynamic_frame.from_options(
    connection_type="s3",
    connection_options={
        "paths": [f"s3://{BUCKET}/solutions/{SOLUTION}/data/raw/"]
    },
    format="csv"
)

# Transform
curated_data = raw_data.apply_mapping([
    ("customer_id", "string", "customer_id", "string"),
    ("tenure", "int", "tenure", "int"),
    ("monthly_charges", "double", "monthly_charges", "double"),
    ("churn", "string", "churn", "string"),
])

# Write curated data
glueContext.write_dynamic_frame.from_options(
    frame=curated_data,
    connection_type="s3",
    connection_options={
        "path": f"s3://{BUCKET}/solutions/{SOLUTION}/data/curated/"
    },
    format="parquet"
)

CI/CD Pipeline Integration

GitHub Actions

name: Deploy S3 Infrastructure
on:
  push:
    branches: [main]
    paths: ['s3/configs/**']

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/GitHubActions
          aws-region: us-west-1

      - name: Validate configuration
        run: |
          docker run --rm \
            -v ~/.aws:/home/s3user/.aws:ro \
            -v $(pwd)/s3/configs:/app/configs:ro \
            -v $(pwd)/s3/reports:/app/reports \
            s3-provisioner:latest \
            --config edge-prod-b001-us-west-1-s3.yaml \
            --action validate-config

      - name: Preview changes
        run: |
          docker run --rm \
            -v ~/.aws:/home/s3user/.aws:ro \
            -v $(pwd)/s3/configs:/app/configs:ro \
            -v $(pwd)/s3/templates:/app/templates \
            -v $(pwd)/s3/reports:/app/reports \
            s3-provisioner:latest \
            --config edge-prod-b001-us-west-1-s3.yaml \
            --action show-changes \
            --solution master-solution

      - name: Deploy
        run: |
          docker run --rm \
            -v ~/.aws:/home/s3user/.aws:ro \
            -v $(pwd)/s3/configs:/app/configs:ro \
            -v $(pwd)/s3/templates:/app/templates \
            -v $(pwd)/s3/reports:/app/reports \
            s3-provisioner:latest \
            --config edge-prod-b001-us-west-1-s3.yaml \
            --action prep-master \
            --solution master-solution \
            --force

Cross-Provisioner Integration

Using S3 Bucket with SEC Provisioner

The SEC Provisioner uploads CloudFormation templates to the S3 bucket created by the S3 Provisioner:

# SEC config references S3 bucket
deployment:
  template_bucket: edge-prod-b001-us-west-1-s3
  template_prefix: solutions/master-solution/templates

Using S3 Bucket with VPC Endpoint

The VPC Provisioner can create an S3 VPC endpoint for private access:

# S3 config references VPC
s3:
  vpc_id: "vpc-0a1b2c3d4e5f6g7h8"
  route_table_ids: "rtb-0a1b2c3d,rtb-4e5f6g7h"

SDK Examples

Boto3 — List Solutions

import boto3

s3 = boto3.client('s3')
bucket = "edge-prod-b001-us-west-1-s3"

response = s3.list_objects_v2(
    Bucket=bucket,
    Prefix="solutions/",
    Delimiter="/"
)

solutions = [p['Prefix'].split('/')[1] for p in response.get('CommonPrefixes', [])]
print(f"Deployed solutions: {solutions}")

AWS CLI — Verify Deployment

# List solution folders
aws s3 ls s3://edge-prod-b001-us-west-1-s3/solutions/

# Count folders in a solution
aws s3 ls s3://edge-prod-b001-us-west-1-s3/solutions/customer-churn/ --recursive | grep '.gitkeep' | wc -l

# Check CloudFormation stack
aws cloudformation describe-stacks \
  --stack-name edge-prod-b001-us-west-1-s3-stack \
  --query 'Stacks[0].StackStatus'