Integration Examples¶
Examples showing how to use S3 infrastructure provisioned by the S3 Provisioner with common ML tools and workflows.
Table of Contents¶
SageMaker Integration¶
Training Job¶
import sagemaker
from sagemaker.estimator import Estimator
# Use provisioned S3 bucket and folder structure
bucket_name = "edge-prod-b001-us-west-1-s3"
solution = "customer-churn"
estimator = Estimator(
image_uri="123456789012.dkr.ecr.us-west-1.amazonaws.com/ml-training:latest",
role="arn:aws:iam::123456789012:role/edge-prod-b001-role-sagemaker-execution",
instance_count=1,
instance_type="ml.m5.xlarge",
output_path=f"s3://{bucket_name}/solutions/{solution}/models/training/",
)
estimator.fit({
"train": f"s3://{bucket_name}/solutions/{solution}/data/processed/train/",
"validation": f"s3://{bucket_name}/solutions/{solution}/data/processed/validation/"
})
Model Registry¶
from sagemaker.model import Model
model = Model(
image_uri="123456789012.dkr.ecr.us-west-1.amazonaws.com/ml-inference:latest",
model_data=f"s3://{bucket_name}/solutions/{solution}/models/training/output/model.tar.gz",
role="arn:aws:iam::123456789012:role/edge-prod-b001-role-sagemaker-execution",
)
model.register(
model_package_group_name=f"{solution}-models",
content_types=["application/json"],
response_types=["application/json"],
inference_instances=["ml.m5.large"],
)
Batch Transform¶
transformer = estimator.transformer(
instance_count=1,
instance_type="ml.m5.xlarge",
output_path=f"s3://{bucket_name}/solutions/{solution}/data/inference/batch/"
)
transformer.transform(
data=f"s3://{bucket_name}/solutions/{solution}/data/inference/input/",
content_type="text/csv"
)
Lambda Integration¶
Inference Function¶
import boto3
import json
s3 = boto3.client('s3')
BUCKET = "edge-prod-b001-us-west-1-s3"
SOLUTION = "customer-churn"
def lambda_handler(event, context):
# Read input from S3
input_key = f"solutions/{SOLUTION}/data/inference/input/{event['filename']}"
response = s3.get_object(Bucket=BUCKET, Key=input_key)
input_data = json.loads(response['Body'].read())
# Run inference (call SageMaker endpoint)
runtime = boto3.client('sagemaker-runtime')
result = runtime.invoke_endpoint(
EndpointName=f"edge-prod-b001-{SOLUTION}-endpoint",
ContentType="application/json",
Body=json.dumps(input_data)
)
# Write results to S3
output_key = f"solutions/{SOLUTION}/data/inference/output/{event['filename']}"
s3.put_object(
Bucket=BUCKET,
Key=output_key,
Body=result['Body'].read()
)
return {"status": "success", "output_key": output_key}
Glue ETL Integration¶
Data Processing Job¶
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
BUCKET = "edge-prod-b001-us-west-1-s3"
SOLUTION = "customer-churn"
# Read raw data
raw_data = glueContext.create_dynamic_frame.from_options(
connection_type="s3",
connection_options={
"paths": [f"s3://{BUCKET}/solutions/{SOLUTION}/data/raw/"]
},
format="csv"
)
# Transform
curated_data = raw_data.apply_mapping([
("customer_id", "string", "customer_id", "string"),
("tenure", "int", "tenure", "int"),
("monthly_charges", "double", "monthly_charges", "double"),
("churn", "string", "churn", "string"),
])
# Write curated data
glueContext.write_dynamic_frame.from_options(
frame=curated_data,
connection_type="s3",
connection_options={
"path": f"s3://{BUCKET}/solutions/{SOLUTION}/data/curated/"
},
format="parquet"
)
CI/CD Pipeline Integration¶
GitHub Actions¶
name: Deploy S3 Infrastructure
on:
push:
branches: [main]
paths: ['s3/configs/**']
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/GitHubActions
aws-region: us-west-1
- name: Validate configuration
run: |
docker run --rm \
-v ~/.aws:/home/s3user/.aws:ro \
-v $(pwd)/s3/configs:/app/configs:ro \
-v $(pwd)/s3/reports:/app/reports \
s3-provisioner:latest \
--config edge-prod-b001-us-west-1-s3.yaml \
--action validate-config
- name: Preview changes
run: |
docker run --rm \
-v ~/.aws:/home/s3user/.aws:ro \
-v $(pwd)/s3/configs:/app/configs:ro \
-v $(pwd)/s3/templates:/app/templates \
-v $(pwd)/s3/reports:/app/reports \
s3-provisioner:latest \
--config edge-prod-b001-us-west-1-s3.yaml \
--action show-changes \
--solution master-solution
- name: Deploy
run: |
docker run --rm \
-v ~/.aws:/home/s3user/.aws:ro \
-v $(pwd)/s3/configs:/app/configs:ro \
-v $(pwd)/s3/templates:/app/templates \
-v $(pwd)/s3/reports:/app/reports \
s3-provisioner:latest \
--config edge-prod-b001-us-west-1-s3.yaml \
--action prep-master \
--solution master-solution \
--force
Cross-Provisioner Integration¶
Using S3 Bucket with SEC Provisioner¶
The SEC Provisioner uploads CloudFormation templates to the S3 bucket created by the S3 Provisioner:
# SEC config references S3 bucket
deployment:
template_bucket: edge-prod-b001-us-west-1-s3
template_prefix: solutions/master-solution/templates
Using S3 Bucket with VPC Endpoint¶
The VPC Provisioner can create an S3 VPC endpoint for private access:
# S3 config references VPC
s3:
vpc_id: "vpc-0a1b2c3d4e5f6g7h8"
route_table_ids: "rtb-0a1b2c3d,rtb-4e5f6g7h"
SDK Examples¶
Boto3 — List Solutions¶
import boto3
s3 = boto3.client('s3')
bucket = "edge-prod-b001-us-west-1-s3"
response = s3.list_objects_v2(
Bucket=bucket,
Prefix="solutions/",
Delimiter="/"
)
solutions = [p['Prefix'].split('/')[1] for p in response.get('CommonPrefixes', [])]
print(f"Deployed solutions: {solutions}")
AWS CLI — Verify Deployment¶
# List solution folders
aws s3 ls s3://edge-prod-b001-us-west-1-s3/solutions/
# Count folders in a solution
aws s3 ls s3://edge-prod-b001-us-west-1-s3/solutions/customer-churn/ --recursive | grep '.gitkeep' | wc -l
# Check CloudFormation stack
aws cloudformation describe-stacks \
--stack-name edge-prod-b001-us-west-1-s3-stack \
--query 'Stacks[0].StackStatus'