README¶
Enterprise-grade S3 infrastructure provisioning tool purpose-built for machine learning workloads.
Table of Contents¶
Why This Tool Exists¶
Most S3 provisioning tools create empty buckets. This tool creates production-ready ML pipeline infrastructure with a complete folder structure for:
Data ingestion (raw → curated → processed → inference)
Model training, evaluation, and registry
Feature engineering and feature stores
Notebooks, artifacts, logs, and reports
Code, configurations, and monitoring
One command deploys 130+ folders organized for enterprise ML operations.
Enterprise-Scale Versatility¶
Functionality |
Description |
|---|---|
Multi-Tenant |
Deploy isolated ML infrastructure for multiple teams or clients |
Multi-Region |
Replicate infrastructure across AWS regions (us-west-1, eu-west-1, ap-southeast-1, etc.) |
Multi-Environment |
Separate dev, staging, and production environments with identical structure |
Multi-Company |
Support multiple companies with distinct configurations and branding |
Flexible Bucket Naming |
Auto-generate standardized names ( |
Multi-Solution Support |
Deploy multiple ML solutions (customer-churn, fraud-detection, demand-forecasting) in a single shared bucket or dedicated buckets per solution |
VPC Integration |
Seamlessly integrates with VPC Provisioner to enable private S3 access via VPC endpoints - all traffic stays within your private cloud with no internet exposure |
Configuration-Driven Structure |
Control the entire S3 folder hierarchy through simple YAML configuration - clients define bucket names, lifecycle policies, versioning, tags, and VPC settings without touching code |
Example:
Deploy customer-churn solution for 3 companies × 2 regions × 3 environments = 18 isolated buckets with one configuration template.
What You Get¶
Battle-Tested ML Folder Structure¶
Saves weeks of infrastructure design time - Instead of designing folder hierarchies from scratch, get a comprehensive, production-ready structure that covers:
Complete ML Pipeline: Raw → Curated → Processed → Inference data flow
Universal Applicability: Works for any ML domain (computer vision, NLP, time series, recommendation systems, fraud detection, etc.)
130+ Organized Folders: Data, models, notebooks, artifacts, code, configs, and monitoring
Enterprise-Ready: Built-in support for governance, compliance, audit trails, and data lineage
Fully Customizable: Use as-is or adapt to your specific needs - remove unused folders or add custom ones
See S3_FOLDERS.md for complete folder structure reference.
Bonus: Enterprise Governance Blueprint¶
Beyond infrastructure provisioning - Get a complete reference architecture for implementing governance, compliance, and audit capabilities:
Ready-to-use JSON schemas for audit logs, data lineage, and compliance metadata
Multi-framework compliance support (GDPR, HIPAA, SOC 2, ISO 27001, CCPA)
RBAC examples with role-based access patterns
Query templates for audit trail analysis
Implementation checklist with AWS service recommendations
See GOVERNANCE_COMPLIANCE.md for the complete governance framework.
ML-Optimized Folder Structure¶
solutions/
customer-churn/
data/
raw/ # Ingested data with date partitioning
curated/ # Cleaned and validated data
processed/ # Feature-engineered training data
train/
validation/
test/
feature_engineering/
inference/ # Batch and realtime predictions
models/
experiments/ # Experiment tracking
training/ # Trained models by algorithm
evaluation/ # Model comparison and monitoring
registry/ # Production/staging/dev model versions
notebooks/ # Jupyter notebooks by phase
artifacts/ # Logs, checkpoints, visualizations, reports
code/ # Pipeline code and tests
config/ # Environment and model configurations
Automated Lifecycle Policies¶
4 pre-configured profiles for cost optimization:
ml-optimized: 30d→IA, 90d→GLACIER (60-70% cost savings)
compliance: 90d→GLACIER, 7-year retention (70-80% savings)
development: 90-day expiration (100% savings)
none: Manual management
Infrastructure as Code¶
CloudFormation-based deployment with stack outputs for multi-stack orchestration
One-command cleanup via CloudFormation stack deletion
67 leaf folders created via CloudFormation templates
Lambda-based folder creation for remaining structure
IAM policies auto-generated
VPC endpoint support
Automated tagging (7 system + custom tags)
Local template validation (YAML syntax, structure, reference integrity)
Infrastructure drift detection against deployed stacks
Change preview via CloudFormation ChangeSets
Safe test deployments with isolated resource names
Built-in cost estimation with region-specific pricing
Quick Start¶
1. Create Configuration¶
configs/my-ml-project.yaml:
client:
company_name: "Acme Corp"
company_prefix: "acme"
account_id: "123456789012"
tenant_id: "a001"
environment:
env: "prod"
region: "us-west-1"
s3:
bucket_name_override: ""
versioning: true
lifecycle_policy: "ml-optimized"
vpc_id: ""
route_table_ids: ""
tags:
Project: "Customer Churn ML"
Owner: "data-science-team"
2. Deploy Master Solution¶
docker run --rm \
-e AWS_PROFILE=default \
-v ~/.aws:/home/s3user/.aws:ro \
-v $(pwd)/s3/configs:/app/configs \
-v $(pwd)/s3/reports:/app/reports \
s3-provisioner:latest \
--config my-ml-project.yaml \
--action prep-master \
--solution master-solution \
--force
Result: Bucket acme-prod-a001-us-west-1-s3 created with complete ML folder structure.
3. Deploy Additional Solutions¶
# Deploy customer churn solution
docker run --rm \
-e AWS_PROFILE=default \
-v ~/.aws:/home/s3user/.aws:ro \
-v $(pwd)/s3/configs:/app/configs \
-v $(pwd)/s3/reports:/app/reports \
s3-provisioner:latest \
--config my-ml-project.yaml \
--action deploy-solution \
--solution customer-churn
Available Solutions: customer-churn, demand-forecasting, fraud-detection
Common Workflows¶
Pattern B: Dedicated Buckets (One Solution Per Bucket)¶
# customer-churn-config.yaml
s3:
bucket_name_override: "acme-prod-a001-us-west-1-customer-churn"
--action prep-master --solution customer-churn --force
Result: acme-prod-a001-us-west-1-customer-churn/solutions/customer-churn/
Key Features¶
Configuration Validation¶
docker run --rm \
-v ~/.aws:/home/s3user/.aws:ro \
-v $(pwd)/s3/configs:/app/configs \
s3-provisioner:latest \
--config my-ml-project.yaml \
--action validate-config
IAM Policy Generation¶
docker run --rm \
-v ~/.aws:/home/s3user/.aws:ro \
-v $(pwd)/s3/configs:/app/configs \
-v $(pwd)/s3/policies:/app/policies \
s3-provisioner:latest \
--config my-ml-project.yaml \
--action create-policy
Template Validation¶
docker run --rm \
-v ~/.aws:/home/s3user/.aws:ro \
-v $(pwd)/s3/configs:/app/configs:ro \
-v $(pwd)/s3/templates:/app/templates \
-v $(pwd)/s3/reports:/app/reports \
s3-provisioner:latest \
--config my-ml-project.yaml \
--action validate-prov-template \
--solution master-solution
Change Preview¶
docker run --rm \
-v ~/.aws:/home/s3user/.aws:ro \
-v $(pwd)/s3/configs:/app/configs:ro \
-v $(pwd)/s3/templates:/app/templates \
-v $(pwd)/s3/reports:/app/reports \
s3-provisioner:latest \
--config my-ml-project.yaml \
--action show-changes \
--solution master-solution
Drift Detection¶
docker run --rm \
-v ~/.aws:/home/s3user/.aws:ro \
-v $(pwd)/s3/configs:/app/configs:ro \
-v $(pwd)/s3/reports:/app/reports \
s3-provisioner:latest \
--config my-ml-project.yaml \
--action check-drift
Infrastructure Cleanup¶
docker run --rm \
-e AWS_PROFILE=default \
-v ~/.aws:/home/s3user/.aws:ro \
-v $(pwd)/s3/configs:/app/configs \
-v $(pwd)/s3/reports:/app/reports \
s3-provisioner:latest \
--config my-ml-project.yaml \
--action tear-down \
--force
Generate Usage Assumptions¶
docker run --rm \
-v ~/.aws:/home/s3user/.aws:ro \
-v $(pwd)/s3/configs:/app/configs \
-v $(pwd)/s3/templates:/app/templates \
-v $(pwd)/s3/reports:/app/reports \
s3-provisioner:latest \
--config my-ml-project.yaml \
--action cost-traffic \
--solution master-solution
Estimate Monthly Costs¶
docker run --rm \
-v ~/.aws:/home/s3user/.aws:ro \
-v $(pwd)/s3/configs:/app/configs \
-v $(pwd)/s3/templates:/app/templates \
-v $(pwd)/s3/reports:/app/reports \
s3-provisioner:latest \
--config my-ml-project.yaml \
--action cost-estimate \
--solution master-solution
Refresh Resource Pricing¶
docker run --rm \
-v ~/.aws:/home/s3user/.aws:ro \
-v $(pwd)/s3/configs:/app/configs \
-v $(pwd)/s3/templates:/app/templates \
-v $(pwd)/s3/reports:/app/reports \
s3-provisioner:latest \
--config my-ml-project.yaml \
--action cost-refresh-prices \
--solution master-solution
AWS Credentials¶
Option 1: AWS Profile (Recommended)
-e AWS_PROFILE=default \
-v ~/.aws:/home/s3user/.aws:ro
Option 2: Environment Variables
-e AWS_ACCESS_KEY_ID=<key> \
-e AWS_SECRET_ACCESS_KEY=<secret> \
-e AWS_DEFAULT_REGION=us-west-1
Option 3: IAM Role (when running on EC2/ECS)
# No credentials needed - uses instance role
Documentation¶
All documentation is embedded in the Docker image:
# Copy all docs to local directory
docker run --rm \
-v $(pwd)/s3/docs:/output \
--entrypoint cp \
s3-provisioner:latest \
-r /app/docs/. /output/
Available Guides:
S3_FOLDERS.md - Complete folder structure reference
GOVERNANCE_COMPLIANCE.md - Enterprise governance reference architecture
USER_GUIDE.md - Complete command reference with 19 actions
CONFIGURATION.md - Configuration parameters and examples
IAM_PERMISSIONS.md - Required AWS permissions
ML_LIFECYCLE_POLICIES.md - Lifecycle policy details
COST_OPTIMIZATION.md - Cost optimization and estimation
TROUBLESHOOTING.md - Common issues and solutions
RELEASE_NOTES.md - Version history and features
System Requirements¶
Docker 20.10+
AWS account with S3 and CloudFormation permissions
512 MB RAM minimum
1 GB disk space
Support¶
See SUPPORT.md for assistance.
License¶
Commercial license via AWS Marketplace subscription.
Copyright © 2025 Axon Tech Labs All rights reserved.
See LICENSE.txt for terms and conditions.