Cost Optimization

Cost optimization strategies for S3 infrastructure provisioned by the S3 Provisioner.

Table of Contents

Cost Overview

S3 costs are driven by three factors:

Cost Factor

Description

Typical Impact

Storage

GB stored per month

60-80% of total

Requests

PUT, GET, LIST operations

10-20% of total

Data Transfer

Data out to internet or other regions

10-30% of total

Key insight: Lifecycle policies are the single most effective cost optimization — they can reduce storage costs by 60-80%.

Built-in Cost Estimation

The S3 Provisioner includes a built-in cost estimation feature that calculates costs specific to your configuration and region. Instead of using the generic estimates in this guide, run the cost estimation workflow for accurate numbers:

# Step 1: Generate usage assumptions (edit defaults to match your usage)
--action cost-traffic --solution master-solution

# Step 2: Calculate costs with region-specific pricing
--action cost-estimate --solution master-solution

The tool produces:

  • Console output with storage, request, transfer, and VPC Endpoint cost breakdown

  • Professional HTML report with per-category detail

  • Monthly and annual totals using on-demand pricing for your region

Edit the usage assumptions file and re-run cost-estimate to model different scenarios (e.g., 100 GB vs. 1 TB storage, low vs. high request volume).

To update pricing data to the latest AWS rates:

--action cost-refresh-prices --solution master-solution

See the User Guide for complete command reference.

Lifecycle Policy Comparison

The S3 Provisioner supports 4 lifecycle profiles. Based on 100 TB ML pipeline (annual):

Policy

Year 1 Cost

Year 2 Cost

5-Year Total

Savings

none

$27,600

$27,600

$138,000

0%

ml-optimized

$10,140

$10,140

$50,700

63%

compliance

$7,080

$7,080

$35,400

74%

development

~$2,300

$0

$2,300

98%

Profile Details

ml-optimized (recommended for production):

  • 30 days → S3 Standard-IA (infrequent access)

  • 90 days → S3 Glacier Flexible Retrieval

  • Balances cost with data accessibility for ML workflows

compliance (regulated industries):

  • 90 days → S3 Glacier Flexible Retrieval

  • 7-year retention for audit requirements

  • Lowest long-term cost for data that must be retained

development (dev/test environments):

  • 90-day expiration — data automatically deleted

  • Zero long-term storage cost

  • Ideal for experiment data and temporary artifacts

none (manual management):

  • No automatic transitions or expirations

  • Full control but highest cost if not actively managed

Storage Class Optimization

Storage Class

Cost/GB/Month

Retrieval

Best For

S3 Standard

$0.023

Instant

Active training data, models in use

S3 Standard-IA

$0.0125

Instant

Completed experiments, archived models

S3 Glacier Flexible

$0.0036

3-5 hours

Historical data, compliance archives

S3 Glacier Deep Archive

$0.00099

12 hours

Long-term retention, rarely accessed

S3 Intelligent-Tiering

$0.023-$0.0036

Automatic

Unpredictable access patterns

*Prices: US East (N. Virginia). Check AWS S3 Pricing for your region.

Cost Reduction Strategies

1. Enable Lifecycle Policies (60-80% savings)

s3:
  lifecycle_policy: ml-optimized  # Don't use 'none' in production

2. Use Shared Buckets (Pattern A)

Deploy multiple ML solutions to a single bucket instead of dedicated buckets per solution:

  • Fewer buckets = simpler management

  • Shared lifecycle policies across solutions

  • Lower per-solution overhead

3. Clean Up Experiment Data

Regularly remove old experiment data, failed training runs, and temporary artifacts:

  • Use gitkeep-none to remove .gitkeep markers from inactive solutions

  • Use purge-bucket to clean up across all solutions

4. Enable Versioning Selectively

s3:
  versioning: false  # Disable for dev/test to avoid storing old versions
  versioning: true   # Enable for production (compliance, rollback)

Versioning doubles storage for frequently updated objects. Use only where needed.

5. Use Development Profile for Non-Production

# Dev environment
s3:
  lifecycle_policy: development  # Auto-delete after 90 days

# Production environment
s3:
  lifecycle_policy: ml-optimized  # Transition to cheaper tiers

6. Monitor with S3 Analytics

Enable S3 Storage Class Analysis to identify objects that could be transitioned to cheaper storage classes. AWS recommends 30 days of monitoring before making transitions.

7. Use VPC Endpoints

If your S3 bucket is accessed from EC2 instances in a VPC, use a VPC endpoint to avoid data transfer charges through NAT Gateway:

  • S3 Gateway Endpoint: Free

  • Saves $0.045/GB on data transfer

Monitoring and Analysis

AWS Cost Explorer

# View S3 costs by bucket
aws ce get-cost-and-usage \
  --time-period Start=2026-03-01,End=2026-04-01 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --filter '{"Dimensions":{"Key":"SERVICE","Values":["Amazon Simple Storage Service"]}}'

S3 Storage Lens

Enable S3 Storage Lens for organization-wide visibility into storage usage, activity, and cost optimization opportunities.

CloudWatch Metrics

Monitor BucketSizeBytes and NumberOfObjects metrics to track storage growth over time.

Cost Estimation by Deployment Pattern

Pattern A: Shared Bucket (1 bucket, 3 solutions)

Item

Monthly Cost

Storage (1 TB Standard)

$23.00

Requests (100K PUT, 1M GET)

$0.90

Data Transfer (50 GB out)

$4.50

Total

~$28.40

Pattern B: Dedicated Buckets (3 buckets, 1 solution each)

Item

Monthly Cost

Storage (1 TB Standard × 3)

$69.00

Requests (100K PUT, 1M GET × 3)

$2.70

Data Transfer (50 GB out × 3)

$13.50

Total

~$85.20

Pattern A saves ~67% compared to Pattern B for the same total data volume.

Prices are estimates based on US East (N. Virginia). Actual costs vary by region and usage patterns. Use --action cost-estimate for region-specific pricing based on your configuration, or the AWS Pricing Calculator for detailed estimates.