Configuration Reference

Table of Contents


Quick Reference

Parameter

Required

Type

Default

Valid Values

client.company_name

string

-

Any string

client.company_prefix

string

-

Lowercase, no spaces

client.account_id

string

-

12 digits (quoted)

client.tenant_id

string

-

4 alphanumeric (quoted)

environment.env

string

-

prod, dev, test, staging

environment.region

string

-

Valid AWS region

s3.bucket_name_override

string

“”

Valid S3 name or “”

s3.versioning

boolean

false

true, false

s3.lifecycle_policy

string

none

ml-optimized, compliance, development, none

s3.vpc_id

string

“”

vpc-xxx or “”

s3.route_table_ids

string

“”

rtb-xxx,rtb-yyy or “”

s3.tags

object

{}

Key-value pairs

Configuration File Structure

The S3 Provisioner uses a YAML configuration file with three main sections:

client:
  company_name: Edge Corp
  company_prefix: edge
  account_id: "123456789012"
  tenant_id: "a001"

environment:
  env: prod
  region: us-west-1

s3:
  bucket_name_override: ""
  versioning: false
  lifecycle_policy: ml-optimized
  vpc_id: ""
  route_table_ids: ""
  tags:
    Purpose: ML Solutions Portfolio
    ManagedBy: CloudFormation
    Owner: data-science

Section 1: Client Configuration

company_name ✅ Required

  • Type: string

  • Description: Full company name

  • Example: Edge Corp, Acme Corporation

company_prefix ✅ Required

  • Type: string

  • Description: Short company identifier (lowercase, no spaces)

  • Constraints: Used in bucket naming

  • Example: edge, acme

account_id ✅ Required

  • Type: string (quoted)

  • Description: 12-digit AWS account ID

  • Format: Must be quoted to preserve leading zeros

  • Example: "123456789012"

tenant_id ✅ Required

  • Type: string (quoted)

  • Description: Human-readable account identifier

  • Format: Must be quoted, 4 alphanumeric characters

  • Example: "a001"

Section 2: Environment Configuration

env ✅ Required

  • Type: string

  • Description: Environment name

  • Valid Values: prod, dev, test, staging

  • Example: prod

region ✅ Required

  • Type: string

  • Description: AWS region for S3 bucket

  • Valid Values: Any valid AWS region (us-east-1, us-west-2, eu-west-1, etc.)

  • Example: us-west-1

Section 3: S3 Configuration

bucket_name_override ❌ Optional

  • Type: string

  • Default: Auto-generated from client/environment values

  • Description: Override auto-generated bucket name

  • Format: Must follow S3 bucket naming rules (lowercase, no underscores)

s3:
  bucket_name_override: ""  # Use auto-generated name
  # OR
  bucket_name_override: "my-custom-bucket-name"

versioning ✅ Required

  • Type: boolean

  • Default: false

  • Description: Enable S3 bucket versioning

  • Note: Versioning is disabled by default. Set to true for production environments.

s3:
  versioning: false  # Versioning disabled (default)
  # OR
  versioning: true   # Versioning enabled (recommended for production)

Production Recommendation: Always enable versioning in production to:

  • Protect against accidental deletions

  • Maintain object history for compliance

  • Enable point-in-time recovery

  • Support disaster recovery scenarios

lifecycle_policy ❌ Optional

  • Type: string

  • Default: none

  • Description: Automated ML-optimized lifecycle policy profile

  • Valid Values: ml-optimized, compliance, development, none

s3:
  lifecycle_policy: ml-optimized  # or compliance, development, none

Lifecycle Policy Profiles

ml-optimized - Production ML workloads with cost optimization

  • Transitions: STANDARD → STANDARD_IA (30 days) → GLACIER (90 days)

  • Expiration: None (data retained indefinitely)

  • Applies to: All data under solutions/ prefix

  • Use case: Active ML pipelines with long-term data retention

compliance - HIPAA/PCI regulated industries

  • Transitions: STANDARD → GLACIER (90 days)

  • Expiration: 2555 days (7 years)

  • Applies to: All data under solutions/ prefix

  • Use case: Regulated data with mandatory retention periods

development - Dev/staging environments

  • Transitions: None

  • Expiration: 90 days

  • Applies to: All data under solutions/ prefix

  • Use case: Temporary development/testing data

none - No lifecycle rules (default)

  • Transitions: None

  • Expiration: None

  • Applies to: N/A

  • Use case: Manual lifecycle management or no lifecycle needed

Lifecycle Policy Details

Profile

30 Days

90 Days

Expiration

Cost Savings

ml-optimized

→ STANDARD_IA

→ GLACIER

Never

~60-70%

compliance

-

→ GLACIER

7 years

~70-80%

development

-

-

90 days

~100% (deleted)

none

-

-

Never

0%

Note: For custom lifecycle rules beyond these profiles, see ML_LIFECYCLE_POLICIES.md for manual implementation guidance.

vpc_id ❌ Optional

  • Type: string

  • Description: VPC ID for S3 Gateway VPC endpoint configuration

  • Format: vpc-xxxxxxxxxxxxxxxxx or empty string

  • Purpose: Creates an S3 Gateway endpoint to enable private S3 access from within the VPC without internet gateway

s3:
  vpc_id: ""  # No VPC endpoint
  # OR
  vpc_id: "vpc-0a1b2c3d4e5f6g7h8"  # Enable S3 Gateway endpoint

Benefits of S3 Gateway Endpoint:

  • Private connectivity to S3 without internet gateway

  • No data transfer charges for S3 access within the same region

  • Enhanced security by keeping traffic within AWS network

  • Required for compliance scenarios that prohibit internet access

route_table_ids ❌ Optional

  • Type: string

  • Description: Comma-separated route table IDs to associate with the S3 Gateway endpoint

  • Format: rtb-xxx,rtb-yyy or empty string

  • Required: Must be provided when vpc_id is specified

  • Purpose: Defines which subnets can access S3 through the gateway endpoint

s3:
  route_table_ids: ""  # No route tables (vpc_id must also be empty)
  # OR
  route_table_ids: "rtb-0a1b2c3d,rtb-4e5f6g7h"  # Multiple route tables

Note: Both vpc_id and route_table_ids must be configured together to create an S3 Gateway endpoint. If one is provided, the other must also be provided.

tags ❌ Optional

  • Type: object (key-value pairs)

  • Description: Custom tags for S3 bucket

  • Note: 7 system tags are automatically applied

s3:
  tags:
    Purpose: ML Solutions Portfolio
    ManagedBy: CloudFormation
    Owner: data-science
    CostCenter: ML-Team
    Project: MLOps-Suite

System Tags (Automatically Applied)

The S3 Provisioner automatically applies 7 mandatory system tags:

Tag Key

Source

Example Value

Company

client.company_name

Edge Corp

CompanyPrefix

client.company_prefix

edge

Environment

environment.env

prod

TenantId

client.tenant_id

a001

Region

environment.region

us-west-1

TotalFolders

Calculated

12

ConfigFile

Derived

edge-prod-a001-us-west-1-s3.yaml

Custom Tags (Optional)

You can add up to 43 custom tags (AWS limit is 50 tags total, 7 are system tags).

Tag Constraints:

  • Key length: 1-128 characters

  • Value length: 0-256 characters

  • Case sensitive

  • No spaces in keys (use hyphens or camelCase)

Usage Assumptions File

The cost-traffic action generates a usage assumptions YAML file used by cost-estimate to calculate S3 infrastructure costs. This file is saved in the configs/ directory.

File Naming

<bucket-name>-usage.yaml

Example: edge-prod-b001-us-west-1-s3-usage.yaml

File Structure

# Auto-generated S3 usage assumptions for cost estimation
# Edit values to match your expected monthly usage

usage:
  storage:
    storage_class: Standard
    data_gb: 100
  requests:
    put_requests_per_month: 10000
    get_requests_per_month: 50000
  transfer:
    data_out_gb_per_month: 10
  vpc_endpoint:
    S3VPCEndpoint:
      type: AWS::EC2::VPCEndpoint
      data_gb_per_month: 50

Parameters

storage_class

  • Type: string

  • Generated: Yes (do not modify)

  • Description: S3 storage class used for pricing lookup

data_gb

  • Type: integer

  • Generated: Yes (with default of 100)

  • Description: Expected total storage in GB

  • Action: Edit to match your expected data volume

put_requests_per_month

  • Type: integer

  • Generated: Yes (with default of 10,000)

  • Description: Expected monthly PUT/COPY/POST/LIST requests

get_requests_per_month

  • Type: integer

  • Generated: Yes (with default of 50,000)

  • Description: Expected monthly GET and other requests

data_out_gb_per_month

  • Type: integer

  • Generated: Yes (with default of 10)

  • Description: Expected monthly data transfer out of AWS in GB

vpc_endpoint (present only if VPC Endpoint is in the template)

  • data_gb_per_month: Expected monthly data through the VPC Endpoint in GB

Usage

  1. Run cost-traffic to generate the file with defaults

  2. Edit values to reflect your expected storage, requests, and transfer

  3. Run cost-estimate to calculate costs based on your assumptions

  4. Re-edit and re-run to model different scenarios

Complete Configuration Examples

Example 1: Production ML Workload

client:
  company_name: Edge Corp
  company_prefix: edge
  account_id: "123456789012"
  tenant_id: "a001"

environment:
  env: prod
  region: us-west-1

s3:
  bucket_name_override: ""
  versioning: true
  lifecycle_policy: ml-optimized
  vpc_id: ""
  route_table_ids: ""
  tags:
    Purpose: ML Solutions Portfolio
    ManagedBy: CloudFormation
    Owner: data-science
    CostCenter: ML-Team
    Project: MLOps-Suite

Result: Production bucket with versioning enabled, ml-optimized lifecycle (30d→IA, 90d→GLACIER), no expiration.

Example 2: Compliance Environment (HIPAA)

client:
  company_name: Healthcare Inc
  company_prefix: health
  account_id: "123456789012"
  tenant_id: "a002"

environment:
  env: prod
  region: us-east-1

s3:
  bucket_name_override: ""
  versioning: true
  lifecycle_policy: compliance
  vpc_id: "vpc-0a1b2c3d4e5f6g7h8"
  route_table_ids: "rtb-0a1b2c3d,rtb-4e5f6g7h"
  tags:
    Purpose: Patient Data Storage
    Compliance: HIPAA
    DataClassification: PHI
    Owner: compliance-team
    ManagedBy: CloudFormation

Result: Compliance bucket with versioning, 7-year retention (90d→GLACIER, expires after 2555 days), VPC endpoint.

Example 3: Development Environment

client:
  company_name: Acme Corp
  company_prefix: acme
  account_id: "123456789012"
  tenant_id: "a003"

environment:
  env: dev
  region: us-west-2

s3:
  bucket_name_override: ""
  versioning: false
  lifecycle_policy: development
  vpc_id: ""
  route_table_ids: ""
  tags:
    Purpose: Development Testing
    Owner: dev-team

Result: Development bucket with no versioning, 90-day expiration, minimal tags.

Example 4: Custom Bucket Name

client:
  company_name: Tech Startup
  company_prefix: tech
  account_id: "123456789012"
  tenant_id: "a004"

environment:
  env: prod
  region: eu-west-1

s3:
  bucket_name_override: "tech-ml-data-prod-eu"
  versioning: true
  lifecycle_policy: ml-optimized
  vpc_id: ""
  route_table_ids: ""
  tags:
    Purpose: ML Data Lake
    Owner: ml-platform-team

Result: Custom-named bucket with ml-optimized lifecycle.

Example 5: No Lifecycle Policy

client:
  company_name: Finance Corp
  company_prefix: finance
  account_id: "123456789012"
  tenant_id: "a005"

environment:
  env: prod
  region: us-east-1

s3:
  bucket_name_override: ""
  versioning: true
  lifecycle_policy: none
  vpc_id: ""
  route_table_ids: ""
  tags:
    Purpose: Financial Data
    Owner: finance-team

Result: Production bucket with versioning, no lifecycle rules (manual management).

Bucket Naming Convention

When bucket_name_override is empty, the tool auto-generates bucket names using this pattern:

{company_prefix}-{env}-{tenant_id}-{region}

Examples:

  • edge-prod-a001-us-west-1-s3

  • acme-dev-a003-us-west-2

  • health-prod-a002-us-east-1-s3

S3 Bucket Naming Rules:

  • 3-63 characters

  • Lowercase letters, numbers, hyphens only

  • Must start/end with letter or number

  • No underscores, spaces, or uppercase

  • Globally unique across all AWS accounts

Folder Structure Created

The S3 Provisioner creates this ML-optimized folder structure:

solutions/
  <solution-name>/
    data/
      raw/                    # Raw ingested data
      curated/               # Cleaned and validated data
      processed/             # Feature-engineered training data
      inference/             # Prediction results
    models/                  # Trained model artifacts
    notebooks/               # Jupyter notebooks
    artifacts/               # Training artifacts
    code/                    # Source code
    config/                  # Configuration files

Lifecycle Policy Application:

  • All lifecycle profiles apply to the entire solutions/ prefix

  • Rules affect all data under solutions/<solution-name>/ including data/, models/, notebooks/, code/, etc.

  • Lifecycle transitions apply uniformly to all objects under the prefix

Configuration Validation

The tool validates configurations before deployment:

Client Section Validation

  • ✅ company_name: Not empty

  • ✅ company_prefix: Lowercase, no spaces

  • ✅ account_id: 12 digits, quoted

  • ✅ tenant_id: Not empty, quoted

Environment Section Validation

  • ✅ env: One of [prod, dev, test, staging]

  • ✅ region: Valid AWS region

S3 Section Validation

  • ✅ bucket_name_override: Empty or valid S3 bucket name

  • ✅ versioning: Boolean (true/false)

  • ✅ lifecycle_policy: One of [ml-optimized, compliance, development, none]

  • ✅ vpc_id: Empty or valid VPC ID format

  • ✅ route_table_ids: Empty or comma-separated route table IDs

  • ✅ tags: Valid key-value pairs (optional)

Configuration Best Practices

1. Production Environments

environment:
  env: prod

s3:
  versioning: true              # Always enable
  lifecycle_policy: ml-optimized # or compliance
  tags:
    Environment: production
    Compliance: required        # If applicable

2. Development Environments

environment:
  env: dev

s3:
  versioning: false             # Optional for dev
  lifecycle_policy: development # Aggressive cleanup
  tags:
    Environment: development

3. Compliance Workloads

s3:
  versioning: true              # Required
  lifecycle_policy: compliance  # 7-year retention
  vpc_id: "vpc-xxx"            # Isolate network access
  route_table_ids: "rtb-xxx"
  tags:
    Compliance: HIPAA           # or PCI, SOC2, etc.
    DataClassification: PHI

4. Cost Optimization

s3:
  lifecycle_policy: ml-optimized # 60-70% cost savings
  # OR
  lifecycle_policy: development  # 100% savings (90d deletion)

5. S3 Gateway VPC Endpoint Configuration

s3:
  vpc_id: "vpc-0a1b2c3d4e5f6g7h8"
  route_table_ids: "rtb-0a1b2c3d,rtb-4e5f6g7h"

When to use S3 Gateway endpoints:

  • Enhanced security (private network access, no internet gateway required)

  • Zero data transfer costs (S3 access within same region)

  • Compliance requirements (HIPAA, PCI-DSS requiring no internet access)

  • Private subnet workloads (EC2, Lambda, SageMaker without NAT gateway)

Configuration requirements:

  • Both vpc_id and route_table_ids must be provided together

  • Route tables should belong to the specified VPC

  • Gateway endpoint is created automatically during bucket provisioning

  • No additional charges for S3 Gateway endpoints

Lifecycle Policy Cost Comparison

⚠️ IMPORTANT DISCLAIMER
The following cost estimates are illustrative examples only and should not be used for budgeting or financial planning.
Actual costs will vary significantly based on:

  • Your specific usage patterns and data access frequency

  • AWS region (pricing varies by region)

  • Current AWS pricing (subject to change)

  • Data transfer costs and request volumes

  • Storage class transition timing

Always use the AWS Pricing Calculator for accurate cost projections specific to your use case.

Scenario: 100 TB ML Pipeline (Annual)

Without Lifecycle Policy (lifecycle_policy: none):

  • 100 TB in STANDARD: ~$2,300/month = $27,600/year

With ml-optimized Profile:

  • 10 TB STANDARD (active): $230/month

  • 30 TB STANDARD_IA (recent): $375/month

  • 60 TB GLACIER (archive): $240/month

  • Total: $845/month = $10,140/year

  • Savings: $17,460/year (63%)

With compliance Profile:

  • 10 TB STANDARD (active): $230/month

  • 90 TB GLACIER (archive): $360/month

  • Total: $590/month = $7,080/year

  • Savings: $20,520/year (74%)

With development Profile:

  • Data deleted after 90 days

  • Minimal storage costs

  • Savings: ~100% (for old data)

YAML Syntax Tips

Strings with Numbers: Always quote numeric strings to preserve formatting

✅ account_id: "123456789012"  # Preserves leading zeros
❌ account_id: 123456789012     # May lose leading zeros

Booleans: Use lowercase true/false without quotes

✅ versioning: true
❌ versioning: "true"  # This is a string, not a boolean

Empty Strings: Use empty quotes for optional string parameters

✅ bucket_name_override: ""
❌ bucket_name_override:        # This is null, not empty string

Indentation: Use 2 spaces (not tabs) for YAML indentation

✅ s3:
  versioning: true
❌ s3:
  versioning: true  # Using tabs instead of spaces causes errors

Troubleshooting Configuration Issues

Issue: Validation Fails

Error: Invalid lifecycle_policy value

Solution: Ensure lifecycle_policy is one of: ml-optimized, compliance, development, none

# ❌ Wrong
s3:
  lifecycle_policy: custom

# ✅ Correct
s3:
  lifecycle_policy: ml-optimized

Issue: Bucket Name Conflict

Error: Bucket name already exists

Solution: Use bucket_name_override with a unique name

s3:
  bucket_name_override: "my-unique-bucket-name-2024"

Issue: Account ID Format

Error: Invalid account_id format

Solution: Quote the account_id to preserve leading zeros

# ❌ Wrong
client:
  account_id: 123456789012

# ✅ Correct
client:
  account_id: "123456789012"

Issue: VPC Endpoint Configuration

Error: Invalid vpc_id format

Solution: Use correct VPC ID format or empty string

# ❌ Wrong
s3:
  vpc_id: vpc-123

# ✅ Correct
s3:
  vpc_id: "vpc-0a1b2c3d4e5f6g7h8"
  # OR
  vpc_id: ""

Additional Resources

  • User Guide - Complete command reference

  • S3 Folder Structure - Complete folder hierarchy reference

  • Governance & Compliance - Enterprise governance implementation guide

  • ML Lifecycle Policies - Custom lifecycle implementation

  • IAM Permissions - Required AWS permissions

  • Troubleshooting - Common issues and solutions

  • Release Notes - Version history and changes

Configuration File Locations

Development:

packages/s3-provisioner-tool/configs/
  edge-prod-a001-us-west-1-s3.yaml
  edge-dev-a002-us-west-2-s3.yaml

Docker Container:

/app/configs/
  your-config.yaml

Mount Example:

docker run --rm \
  -v $(pwd)/s3/configs:/app/configs \
  s3-provisioner:latest \
  --config your-config.yaml \
  --action validate-config

Copyright © 2025 Axon Tech Labs All rights reserved.

See LICENSE.txt for terms and conditions.