Integration Examples¶

Examples showing how to use S3 infrastructure provisioned by the S3 Provisioner with common ML tools and workflows.

Table of Contents¶

SageMaker Integration
Lambda Integration
Glue ETL Integration
CI/CD Pipeline Integration
Cross-Provisioner Integration
SDK Examples

SageMaker Integration¶

Training Job¶

import sagemaker
from sagemaker.estimator import Estimator

# Use provisioned S3 bucket and folder structure
bucket_name = "edge-prod-b001-us-west-1-s3"
solution = "customer-churn"

estimator = Estimator(
    image_uri="123456789012.dkr.ecr.us-west-1.amazonaws.com/ml-training:latest",
    role="arn:aws:iam::123456789012:role/edge-prod-b001-role-sagemaker-execution",
    instance_count=1,
    instance_type="ml.m5.xlarge",
    output_path=f"s3://{bucket_name}/solutions/{solution}/models/training/",
)

estimator.fit({
    "train": f"s3://{bucket_name}/solutions/{solution}/data/processed/train/",
    "validation": f"s3://{bucket_name}/solutions/{solution}/data/processed/validation/"
})

Model Registry¶

from sagemaker.model import Model

model = Model(
    image_uri="123456789012.dkr.ecr.us-west-1.amazonaws.com/ml-inference:latest",
    model_data=f"s3://{bucket_name}/solutions/{solution}/models/training/output/model.tar.gz",
    role="arn:aws:iam::123456789012:role/edge-prod-b001-role-sagemaker-execution",
)

model.register(
    model_package_group_name=f"{solution}-models",
    content_types=["application/json"],
    response_types=["application/json"],
    inference_instances=["ml.m5.large"],
)

Batch Transform¶

transformer = estimator.transformer(
    instance_count=1,
    instance_type="ml.m5.xlarge",
    output_path=f"s3://{bucket_name}/solutions/{solution}/data/inference/batch/"
)

transformer.transform(
    data=f"s3://{bucket_name}/solutions/{solution}/data/inference/input/",
    content_type="text/csv"
)

Lambda Integration¶

Inference Function¶

import boto3
import json

s3 = boto3.client('s3')

BUCKET = "edge-prod-b001-us-west-1-s3"
SOLUTION = "customer-churn"

def lambda_handler(event, context):
    # Read input from S3
    input_key = f"solutions/{SOLUTION}/data/inference/input/{event['filename']}"
    response = s3.get_object(Bucket=BUCKET, Key=input_key)
    input_data = json.loads(response['Body'].read())

    # Run inference (call SageMaker endpoint)
    runtime = boto3.client('sagemaker-runtime')
    result = runtime.invoke_endpoint(
        EndpointName=f"edge-prod-b001-{SOLUTION}-endpoint",
        ContentType="application/json",
        Body=json.dumps(input_data)
    )

    # Write results to S3
    output_key = f"solutions/{SOLUTION}/data/inference/output/{event['filename']}"
    s3.put_object(
        Bucket=BUCKET,
        Key=output_key,
        Body=result['Body'].read()
    )

    return {"status": "success", "output_key": output_key}

Glue ETL Integration¶

Data Processing Job¶

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext

args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)

BUCKET = "edge-prod-b001-us-west-1-s3"
SOLUTION = "customer-churn"

# Read raw data
raw_data = glueContext.create_dynamic_frame.from_options(
    connection_type="s3",
    connection_options={
        "paths": [f"s3://{BUCKET}/solutions/{SOLUTION}/data/raw/"]
    },
    format="csv"
)

# Transform
curated_data = raw_data.apply_mapping([
    ("customer_id", "string", "customer_id", "string"),
    ("tenure", "int", "tenure", "int"),
    ("monthly_charges", "double", "monthly_charges", "double"),
    ("churn", "string", "churn", "string"),
])

# Write curated data
glueContext.write_dynamic_frame.from_options(
    frame=curated_data,
    connection_type="s3",
    connection_options={
        "path": f"s3://{BUCKET}/solutions/{SOLUTION}/data/curated/"
    },
    format="parquet"
)

CI/CD Pipeline Integration¶

GitHub Actions¶

name: Deploy S3 Infrastructure
on:
  push:
    branches: [main]
    paths: ['s3/configs/**']

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/GitHubActions
          aws-region: us-west-1

      - name: Validate configuration
        run: |
          docker run --rm \
            -v ~/.aws:/home/s3user/.aws:ro \
            -v $(pwd)/s3/configs:/app/configs:ro \
            -v $(pwd)/s3/reports:/app/reports \
            s3-provisioner:latest \
            --config edge-prod-b001-us-west-1-s3.yaml \
            --action validate-config

      - name: Preview changes
        run: |
          docker run --rm \
            -v ~/.aws:/home/s3user/.aws:ro \
            -v $(pwd)/s3/configs:/app/configs:ro \
            -v $(pwd)/s3/templates:/app/templates \
            -v $(pwd)/s3/reports:/app/reports \
            s3-provisioner:latest \
            --config edge-prod-b001-us-west-1-s3.yaml \
            --action show-changes \
            --solution master-solution

      - name: Deploy
        run: |
          docker run --rm \
            -v ~/.aws:/home/s3user/.aws:ro \
            -v $(pwd)/s3/configs:/app/configs:ro \
            -v $(pwd)/s3/templates:/app/templates \
            -v $(pwd)/s3/reports:/app/reports \
            s3-provisioner:latest \
            --config edge-prod-b001-us-west-1-s3.yaml \
            --action prep-master \
            --solution master-solution \
            --force

Cross-Provisioner Integration¶

Using S3 Bucket with SEC Provisioner¶

The SEC Provisioner uploads CloudFormation templates to the S3 bucket created by the S3 Provisioner:

# SEC config references S3 bucket
deployment:
  template_bucket: edge-prod-b001-us-west-1-s3
  template_prefix: solutions/master-solution/templates

Using S3 Bucket with VPC Endpoint¶

The VPC Provisioner can create an S3 VPC endpoint for private access:

# S3 config references VPC
s3:
  vpc_id: "vpc-0a1b2c3d4e5f6g7h8"
  route_table_ids: "rtb-0a1b2c3d,rtb-4e5f6g7h"

SDK Examples¶

Boto3 — List Solutions¶

import boto3

s3 = boto3.client('s3')
bucket = "edge-prod-b001-us-west-1-s3"

response = s3.list_objects_v2(
    Bucket=bucket,
    Prefix="solutions/",
    Delimiter="/"
)

solutions = [p['Prefix'].split('/')[1] for p in response.get('CommonPrefixes', [])]
print(f"Deployed solutions: {solutions}")

AWS CLI — Verify Deployment¶

# List solution folders
aws s3 ls s3://edge-prod-b001-us-west-1-s3/solutions/

# Count folders in a solution
aws s3 ls s3://edge-prod-b001-us-west-1-s3/solutions/customer-churn/ --recursive | grep '.gitkeep' | wc -l

# Check CloudFormation stack
aws cloudformation describe-stacks \
  --stack-name edge-prod-b001-us-west-1-s3-stack \
  --query 'Stacks[0].StackStatus'