Configuration Reference¶
Table of Contents¶
Quick Reference¶
Parameter |
Required |
Type |
Default |
Valid Values |
|---|---|---|---|---|
client.company_name |
✅ |
string |
- |
Any string |
client.company_prefix |
✅ |
string |
- |
Lowercase, no spaces |
client.account_id |
✅ |
string |
- |
12 digits (quoted) |
client.tenant_id |
✅ |
string |
- |
4 alphanumeric (quoted) |
environment.env |
✅ |
string |
- |
prod, dev, test, staging |
environment.region |
✅ |
string |
- |
Valid AWS region |
s3.bucket_name_override |
❌ |
string |
“” |
Valid S3 name or “” |
s3.versioning |
✅ |
boolean |
false |
true, false |
s3.lifecycle_policy |
❌ |
string |
none |
ml-optimized, compliance, development, none |
s3.vpc_id |
❌ |
string |
“” |
vpc-xxx or “” |
s3.route_table_ids |
❌ |
string |
“” |
rtb-xxx,rtb-yyy or “” |
s3.tags |
❌ |
object |
{} |
Key-value pairs |
Configuration File Structure¶
The S3 Provisioner uses a YAML configuration file with three main sections:
client:
company_name: Edge Corp
company_prefix: edge
account_id: "123456789012"
tenant_id: "a001"
environment:
env: prod
region: us-west-1
s3:
bucket_name_override: ""
versioning: false
lifecycle_policy: ml-optimized
vpc_id: ""
route_table_ids: ""
tags:
Purpose: ML Solutions Portfolio
ManagedBy: CloudFormation
Owner: data-science
Section 1: Client Configuration¶
company_name ✅ Required¶
Type: string
Description: Full company name
Example:
Edge Corp,Acme Corporation
company_prefix ✅ Required¶
Type: string
Description: Short company identifier (lowercase, no spaces)
Constraints: Used in bucket naming
Example:
edge,acme
account_id ✅ Required¶
Type: string (quoted)
Description: 12-digit AWS account ID
Format: Must be quoted to preserve leading zeros
Example:
"123456789012"
tenant_id ✅ Required¶
Type: string (quoted)
Description: Human-readable account identifier
Format: Must be quoted, 4 alphanumeric characters
Example:
"a001"
Section 2: Environment Configuration¶
env ✅ Required¶
Type: string
Description: Environment name
Valid Values:
prod,dev,test,stagingExample:
prod
region ✅ Required¶
Type: string
Description: AWS region for S3 bucket
Valid Values: Any valid AWS region (us-east-1, us-west-2, eu-west-1, etc.)
Example:
us-west-1
Section 3: S3 Configuration¶
bucket_name_override ❌ Optional¶
Type: string
Default: Auto-generated from client/environment values
Description: Override auto-generated bucket name
Format: Must follow S3 bucket naming rules (lowercase, no underscores)
s3:
bucket_name_override: "" # Use auto-generated name
# OR
bucket_name_override: "my-custom-bucket-name"
versioning ✅ Required¶
Type: boolean
Default: false
Description: Enable S3 bucket versioning
Note: Versioning is disabled by default. Set to
truefor production environments.
s3:
versioning: false # Versioning disabled (default)
# OR
versioning: true # Versioning enabled (recommended for production)
Production Recommendation: Always enable versioning in production to:
Protect against accidental deletions
Maintain object history for compliance
Enable point-in-time recovery
Support disaster recovery scenarios
lifecycle_policy ❌ Optional¶
Type: string
Default: none
Description: Automated ML-optimized lifecycle policy profile
Valid Values:
ml-optimized,compliance,development,none
s3:
lifecycle_policy: ml-optimized # or compliance, development, none
Lifecycle Policy Profiles¶
ml-optimized - Production ML workloads with cost optimization
Transitions: STANDARD → STANDARD_IA (30 days) → GLACIER (90 days)
Expiration: None (data retained indefinitely)
Applies to: All data under
solutions/prefixUse case: Active ML pipelines with long-term data retention
compliance - HIPAA/PCI regulated industries
Transitions: STANDARD → GLACIER (90 days)
Expiration: 2555 days (7 years)
Applies to: All data under
solutions/prefixUse case: Regulated data with mandatory retention periods
development - Dev/staging environments
Transitions: None
Expiration: 90 days
Applies to: All data under
solutions/prefixUse case: Temporary development/testing data
none - No lifecycle rules (default)
Transitions: None
Expiration: None
Applies to: N/A
Use case: Manual lifecycle management or no lifecycle needed
Lifecycle Policy Details¶
Profile |
30 Days |
90 Days |
Expiration |
Cost Savings |
|---|---|---|---|---|
ml-optimized |
→ STANDARD_IA |
→ GLACIER |
Never |
~60-70% |
compliance |
- |
→ GLACIER |
7 years |
~70-80% |
development |
- |
- |
90 days |
~100% (deleted) |
none |
- |
- |
Never |
0% |
Note: For custom lifecycle rules beyond these profiles, see ML_LIFECYCLE_POLICIES.md for manual implementation guidance.
vpc_id ❌ Optional¶
Type: string
Description: VPC ID for S3 Gateway VPC endpoint configuration
Format:
vpc-xxxxxxxxxxxxxxxxxor empty stringPurpose: Creates an S3 Gateway endpoint to enable private S3 access from within the VPC without internet gateway
s3:
vpc_id: "" # No VPC endpoint
# OR
vpc_id: "vpc-0a1b2c3d4e5f6g7h8" # Enable S3 Gateway endpoint
Benefits of S3 Gateway Endpoint:
Private connectivity to S3 without internet gateway
No data transfer charges for S3 access within the same region
Enhanced security by keeping traffic within AWS network
Required for compliance scenarios that prohibit internet access
route_table_ids ❌ Optional¶
Type: string
Description: Comma-separated route table IDs to associate with the S3 Gateway endpoint
Format:
rtb-xxx,rtb-yyyor empty stringRequired: Must be provided when
vpc_idis specifiedPurpose: Defines which subnets can access S3 through the gateway endpoint
s3:
route_table_ids: "" # No route tables (vpc_id must also be empty)
# OR
route_table_ids: "rtb-0a1b2c3d,rtb-4e5f6g7h" # Multiple route tables
Note: Both vpc_id and route_table_ids must be configured together to create an S3 Gateway endpoint. If one is provided, the other must also be provided.
Usage Assumptions File¶
The cost-traffic action generates a usage assumptions YAML file used by cost-estimate to calculate S3 infrastructure costs. This file is saved in the configs/ directory.
File Naming¶
<bucket-name>-usage.yaml
Example: edge-prod-b001-us-west-1-s3-usage.yaml
File Structure¶
# Auto-generated S3 usage assumptions for cost estimation
# Edit values to match your expected monthly usage
usage:
storage:
storage_class: Standard
data_gb: 100
requests:
put_requests_per_month: 10000
get_requests_per_month: 50000
transfer:
data_out_gb_per_month: 10
vpc_endpoint:
S3VPCEndpoint:
type: AWS::EC2::VPCEndpoint
data_gb_per_month: 50
Parameters¶
storage_class
Type: string
Generated: Yes (do not modify)
Description: S3 storage class used for pricing lookup
data_gb
Type: integer
Generated: Yes (with default of 100)
Description: Expected total storage in GB
Action: Edit to match your expected data volume
put_requests_per_month
Type: integer
Generated: Yes (with default of 10,000)
Description: Expected monthly PUT/COPY/POST/LIST requests
get_requests_per_month
Type: integer
Generated: Yes (with default of 50,000)
Description: Expected monthly GET and other requests
data_out_gb_per_month
Type: integer
Generated: Yes (with default of 10)
Description: Expected monthly data transfer out of AWS in GB
vpc_endpoint (present only if VPC Endpoint is in the template)
data_gb_per_month: Expected monthly data through the VPC Endpoint in GB
Usage¶
Run
cost-trafficto generate the file with defaultsEdit values to reflect your expected storage, requests, and transfer
Run
cost-estimateto calculate costs based on your assumptionsRe-edit and re-run to model different scenarios
Complete Configuration Examples¶
Example 1: Production ML Workload¶
client:
company_name: Edge Corp
company_prefix: edge
account_id: "123456789012"
tenant_id: "a001"
environment:
env: prod
region: us-west-1
s3:
bucket_name_override: ""
versioning: true
lifecycle_policy: ml-optimized
vpc_id: ""
route_table_ids: ""
tags:
Purpose: ML Solutions Portfolio
ManagedBy: CloudFormation
Owner: data-science
CostCenter: ML-Team
Project: MLOps-Suite
Result: Production bucket with versioning enabled, ml-optimized lifecycle (30d→IA, 90d→GLACIER), no expiration.
Example 2: Compliance Environment (HIPAA)¶
client:
company_name: Healthcare Inc
company_prefix: health
account_id: "123456789012"
tenant_id: "a002"
environment:
env: prod
region: us-east-1
s3:
bucket_name_override: ""
versioning: true
lifecycle_policy: compliance
vpc_id: "vpc-0a1b2c3d4e5f6g7h8"
route_table_ids: "rtb-0a1b2c3d,rtb-4e5f6g7h"
tags:
Purpose: Patient Data Storage
Compliance: HIPAA
DataClassification: PHI
Owner: compliance-team
ManagedBy: CloudFormation
Result: Compliance bucket with versioning, 7-year retention (90d→GLACIER, expires after 2555 days), VPC endpoint.
Example 3: Development Environment¶
client:
company_name: Acme Corp
company_prefix: acme
account_id: "123456789012"
tenant_id: "a003"
environment:
env: dev
region: us-west-2
s3:
bucket_name_override: ""
versioning: false
lifecycle_policy: development
vpc_id: ""
route_table_ids: ""
tags:
Purpose: Development Testing
Owner: dev-team
Result: Development bucket with no versioning, 90-day expiration, minimal tags.
Example 4: Custom Bucket Name¶
client:
company_name: Tech Startup
company_prefix: tech
account_id: "123456789012"
tenant_id: "a004"
environment:
env: prod
region: eu-west-1
s3:
bucket_name_override: "tech-ml-data-prod-eu"
versioning: true
lifecycle_policy: ml-optimized
vpc_id: ""
route_table_ids: ""
tags:
Purpose: ML Data Lake
Owner: ml-platform-team
Result: Custom-named bucket with ml-optimized lifecycle.
Example 5: No Lifecycle Policy¶
client:
company_name: Finance Corp
company_prefix: finance
account_id: "123456789012"
tenant_id: "a005"
environment:
env: prod
region: us-east-1
s3:
bucket_name_override: ""
versioning: true
lifecycle_policy: none
vpc_id: ""
route_table_ids: ""
tags:
Purpose: Financial Data
Owner: finance-team
Result: Production bucket with versioning, no lifecycle rules (manual management).
Bucket Naming Convention¶
When bucket_name_override is empty, the tool auto-generates bucket names using this pattern:
{company_prefix}-{env}-{tenant_id}-{region}
Examples:
edge-prod-a001-us-west-1-s3acme-dev-a003-us-west-2health-prod-a002-us-east-1-s3
S3 Bucket Naming Rules:
3-63 characters
Lowercase letters, numbers, hyphens only
Must start/end with letter or number
No underscores, spaces, or uppercase
Globally unique across all AWS accounts
Folder Structure Created¶
The S3 Provisioner creates this ML-optimized folder structure:
solutions/
<solution-name>/
data/
raw/ # Raw ingested data
curated/ # Cleaned and validated data
processed/ # Feature-engineered training data
inference/ # Prediction results
models/ # Trained model artifacts
notebooks/ # Jupyter notebooks
artifacts/ # Training artifacts
code/ # Source code
config/ # Configuration files
Lifecycle Policy Application:
All lifecycle profiles apply to the entire
solutions/prefixRules affect all data under
solutions/<solution-name>/including data/, models/, notebooks/, code/, etc.Lifecycle transitions apply uniformly to all objects under the prefix
Configuration Validation¶
The tool validates configurations before deployment:
Client Section Validation¶
✅ company_name: Not empty
✅ company_prefix: Lowercase, no spaces
✅ account_id: 12 digits, quoted
✅ tenant_id: Not empty, quoted
Environment Section Validation¶
✅ env: One of [prod, dev, test, staging]
✅ region: Valid AWS region
S3 Section Validation¶
✅ bucket_name_override: Empty or valid S3 bucket name
✅ versioning: Boolean (true/false)
✅ lifecycle_policy: One of [ml-optimized, compliance, development, none]
✅ vpc_id: Empty or valid VPC ID format
✅ route_table_ids: Empty or comma-separated route table IDs
✅ tags: Valid key-value pairs (optional)
Configuration Best Practices¶
1. Production Environments¶
environment:
env: prod
s3:
versioning: true # Always enable
lifecycle_policy: ml-optimized # or compliance
tags:
Environment: production
Compliance: required # If applicable
2. Development Environments¶
environment:
env: dev
s3:
versioning: false # Optional for dev
lifecycle_policy: development # Aggressive cleanup
tags:
Environment: development
3. Compliance Workloads¶
s3:
versioning: true # Required
lifecycle_policy: compliance # 7-year retention
vpc_id: "vpc-xxx" # Isolate network access
route_table_ids: "rtb-xxx"
tags:
Compliance: HIPAA # or PCI, SOC2, etc.
DataClassification: PHI
4. Cost Optimization¶
s3:
lifecycle_policy: ml-optimized # 60-70% cost savings
# OR
lifecycle_policy: development # 100% savings (90d deletion)
5. S3 Gateway VPC Endpoint Configuration¶
s3:
vpc_id: "vpc-0a1b2c3d4e5f6g7h8"
route_table_ids: "rtb-0a1b2c3d,rtb-4e5f6g7h"
When to use S3 Gateway endpoints:
Enhanced security (private network access, no internet gateway required)
Zero data transfer costs (S3 access within same region)
Compliance requirements (HIPAA, PCI-DSS requiring no internet access)
Private subnet workloads (EC2, Lambda, SageMaker without NAT gateway)
Configuration requirements:
Both
vpc_idandroute_table_idsmust be provided togetherRoute tables should belong to the specified VPC
Gateway endpoint is created automatically during bucket provisioning
No additional charges for S3 Gateway endpoints
Lifecycle Policy Cost Comparison¶
⚠️ IMPORTANT DISCLAIMER
The following cost estimates are illustrative examples only and should not be used for budgeting or financial planning.
Actual costs will vary significantly based on:
Your specific usage patterns and data access frequency
AWS region (pricing varies by region)
Current AWS pricing (subject to change)
Data transfer costs and request volumes
Storage class transition timing
Always use the AWS Pricing Calculator for accurate cost projections specific to your use case.
Scenario: 100 TB ML Pipeline (Annual)¶
Without Lifecycle Policy (lifecycle_policy: none):
100 TB in STANDARD: ~$2,300/month = $27,600/year
With ml-optimized Profile:
10 TB STANDARD (active): $230/month
30 TB STANDARD_IA (recent): $375/month
60 TB GLACIER (archive): $240/month
Total: $845/month = $10,140/year
Savings: $17,460/year (63%)
With compliance Profile:
10 TB STANDARD (active): $230/month
90 TB GLACIER (archive): $360/month
Total: $590/month = $7,080/year
Savings: $20,520/year (74%)
With development Profile:
Data deleted after 90 days
Minimal storage costs
Savings: ~100% (for old data)
YAML Syntax Tips¶
Strings with Numbers: Always quote numeric strings to preserve formatting
✅ account_id: "123456789012" # Preserves leading zeros
❌ account_id: 123456789012 # May lose leading zeros
Booleans: Use lowercase true/false without quotes
✅ versioning: true
❌ versioning: "true" # This is a string, not a boolean
Empty Strings: Use empty quotes for optional string parameters
✅ bucket_name_override: ""
❌ bucket_name_override: # This is null, not empty string
Indentation: Use 2 spaces (not tabs) for YAML indentation
✅ s3:
versioning: true
❌ s3:
versioning: true # Using tabs instead of spaces causes errors
Troubleshooting Configuration Issues¶
Issue: Validation Fails¶
Error: Invalid lifecycle_policy value
Solution: Ensure lifecycle_policy is one of: ml-optimized, compliance, development, none
# ❌ Wrong
s3:
lifecycle_policy: custom
# ✅ Correct
s3:
lifecycle_policy: ml-optimized
Issue: Bucket Name Conflict¶
Error: Bucket name already exists
Solution: Use bucket_name_override with a unique name
s3:
bucket_name_override: "my-unique-bucket-name-2024"
Issue: Account ID Format¶
Error: Invalid account_id format
Solution: Quote the account_id to preserve leading zeros
# ❌ Wrong
client:
account_id: 123456789012
# ✅ Correct
client:
account_id: "123456789012"
Issue: VPC Endpoint Configuration¶
Error: Invalid vpc_id format
Solution: Use correct VPC ID format or empty string
# ❌ Wrong
s3:
vpc_id: vpc-123
# ✅ Correct
s3:
vpc_id: "vpc-0a1b2c3d4e5f6g7h8"
# OR
vpc_id: ""
Additional Resources¶
User Guide - Complete command reference
S3 Folder Structure - Complete folder hierarchy reference
Governance & Compliance - Enterprise governance implementation guide
ML Lifecycle Policies - Custom lifecycle implementation
IAM Permissions - Required AWS permissions
Troubleshooting - Common issues and solutions
Release Notes - Version history and changes
Configuration File Locations¶
Development:
packages/s3-provisioner-tool/configs/
edge-prod-a001-us-west-1-s3.yaml
edge-dev-a002-us-west-2-s3.yaml
Docker Container:
/app/configs/
your-config.yaml
Mount Example:
docker run --rm \
-v $(pwd)/s3/configs:/app/configs \
s3-provisioner:latest \
--config your-config.yaml \
--action validate-config
Copyright © 2025 Axon Tech Labs All rights reserved.
See LICENSE.txt for terms and conditions.