Cost Optimization¶
Cost optimization strategies for S3 infrastructure provisioned by the S3 Provisioner.
Table of Contents¶
Cost Overview¶
S3 costs are driven by three factors:
Cost Factor |
Description |
Typical Impact |
|---|---|---|
Storage |
GB stored per month |
60-80% of total |
Requests |
PUT, GET, LIST operations |
10-20% of total |
Data Transfer |
Data out to internet or other regions |
10-30% of total |
Key insight: Lifecycle policies are the single most effective cost optimization — they can reduce storage costs by 60-80%.
Built-in Cost Estimation¶
The S3 Provisioner includes a built-in cost estimation feature that calculates costs specific to your configuration and region. Instead of using the generic estimates in this guide, run the cost estimation workflow for accurate numbers:
# Step 1: Generate usage assumptions (edit defaults to match your usage)
--action cost-traffic --solution master-solution
# Step 2: Calculate costs with region-specific pricing
--action cost-estimate --solution master-solution
The tool produces:
Console output with storage, request, transfer, and VPC Endpoint cost breakdown
Professional HTML report with per-category detail
Monthly and annual totals using on-demand pricing for your region
Edit the usage assumptions file and re-run cost-estimate to model different scenarios (e.g., 100 GB vs. 1 TB storage, low vs. high request volume).
To update pricing data to the latest AWS rates:
--action cost-refresh-prices --solution master-solution
See the User Guide for complete command reference.
Lifecycle Policy Comparison¶
The S3 Provisioner supports 4 lifecycle profiles. Based on 100 TB ML pipeline (annual):
Policy |
Year 1 Cost |
Year 2 Cost |
5-Year Total |
Savings |
|---|---|---|---|---|
none |
$27,600 |
$27,600 |
$138,000 |
0% |
ml-optimized |
$10,140 |
$10,140 |
$50,700 |
63% |
compliance |
$7,080 |
$7,080 |
$35,400 |
74% |
development |
~$2,300 |
$0 |
$2,300 |
98% |
Profile Details¶
ml-optimized (recommended for production):
30 days → S3 Standard-IA (infrequent access)
90 days → S3 Glacier Flexible Retrieval
Balances cost with data accessibility for ML workflows
compliance (regulated industries):
90 days → S3 Glacier Flexible Retrieval
7-year retention for audit requirements
Lowest long-term cost for data that must be retained
development (dev/test environments):
90-day expiration — data automatically deleted
Zero long-term storage cost
Ideal for experiment data and temporary artifacts
none (manual management):
No automatic transitions or expirations
Full control but highest cost if not actively managed
Storage Class Optimization¶
Storage Class |
Cost/GB/Month |
Retrieval |
Best For |
|---|---|---|---|
S3 Standard |
$0.023 |
Instant |
Active training data, models in use |
S3 Standard-IA |
$0.0125 |
Instant |
Completed experiments, archived models |
S3 Glacier Flexible |
$0.0036 |
3-5 hours |
Historical data, compliance archives |
S3 Glacier Deep Archive |
$0.00099 |
12 hours |
Long-term retention, rarely accessed |
S3 Intelligent-Tiering |
$0.023-$0.0036 |
Automatic |
Unpredictable access patterns |
*Prices: US East (N. Virginia). Check AWS S3 Pricing for your region.
Cost Reduction Strategies¶
1. Enable Lifecycle Policies (60-80% savings)¶
s3:
lifecycle_policy: ml-optimized # Don't use 'none' in production
3. Clean Up Experiment Data¶
Regularly remove old experiment data, failed training runs, and temporary artifacts:
Use
gitkeep-noneto remove .gitkeep markers from inactive solutionsUse
purge-bucketto clean up across all solutions
4. Enable Versioning Selectively¶
s3:
versioning: false # Disable for dev/test to avoid storing old versions
versioning: true # Enable for production (compliance, rollback)
Versioning doubles storage for frequently updated objects. Use only where needed.
5. Use Development Profile for Non-Production¶
# Dev environment
s3:
lifecycle_policy: development # Auto-delete after 90 days
# Production environment
s3:
lifecycle_policy: ml-optimized # Transition to cheaper tiers
6. Monitor with S3 Analytics¶
Enable S3 Storage Class Analysis to identify objects that could be transitioned to cheaper storage classes. AWS recommends 30 days of monitoring before making transitions.
7. Use VPC Endpoints¶
If your S3 bucket is accessed from EC2 instances in a VPC, use a VPC endpoint to avoid data transfer charges through NAT Gateway:
S3 Gateway Endpoint: Free
Saves $0.045/GB on data transfer
Monitoring and Analysis¶
AWS Cost Explorer¶
# View S3 costs by bucket
aws ce get-cost-and-usage \
--time-period Start=2026-03-01,End=2026-04-01 \
--granularity MONTHLY \
--metrics BlendedCost \
--filter '{"Dimensions":{"Key":"SERVICE","Values":["Amazon Simple Storage Service"]}}'
S3 Storage Lens¶
Enable S3 Storage Lens for organization-wide visibility into storage usage, activity, and cost optimization opportunities.
CloudWatch Metrics¶
Monitor BucketSizeBytes and NumberOfObjects metrics to track storage growth over time.
Cost Estimation by Deployment Pattern¶
Pattern B: Dedicated Buckets (3 buckets, 1 solution each)¶
Item |
Monthly Cost |
|---|---|
Storage (1 TB Standard × 3) |
$69.00 |
Requests (100K PUT, 1M GET × 3) |
$2.70 |
Data Transfer (50 GB out × 3) |
$13.50 |
Total |
~$85.20 |
Pattern A saves ~67% compared to Pattern B for the same total data volume.
Prices are estimates based on US East (N. Virginia). Actual costs vary by region and usage patterns. Use --action cost-estimate for region-specific pricing based on your configuration, or the AWS Pricing Calculator for detailed estimates.