README

Enterprise-grade S3 infrastructure provisioning tool purpose-built for machine learning workloads.

Table of Contents

Why This Tool Exists

Most S3 provisioning tools create empty buckets. This tool creates production-ready ML pipeline infrastructure with a complete folder structure for:

  • Data ingestion (raw → curated → processed → inference)

  • Model training, evaluation, and registry

  • Feature engineering and feature stores

  • Notebooks, artifacts, logs, and reports

  • Code, configurations, and monitoring

One command deploys 130+ folders organized for enterprise ML operations.

Enterprise-Scale Versatility

Functionality

Description

Multi-Tenant

Deploy isolated ML infrastructure for multiple teams or clients

Multi-Region

Replicate infrastructure across AWS regions (us-west-1, eu-west-1, ap-southeast-1, etc.)

Multi-Environment

Separate dev, staging, and production environments with identical structure

Multi-Company

Support multiple companies with distinct configurations and branding

Flexible Bucket Naming

Auto-generate standardized names ({prefix}-{env}-{alias}-{region}) or use custom client-specified names

Multi-Solution Support

Deploy multiple ML solutions (customer-churn, fraud-detection, demand-forecasting) in a single shared bucket or dedicated buckets per solution

VPC Integration

Seamlessly integrates with VPC Provisioner to enable private S3 access via VPC endpoints - all traffic stays within your private cloud with no internet exposure

Configuration-Driven Structure

Control the entire S3 folder hierarchy through simple YAML configuration - clients define bucket names, lifecycle policies, versioning, tags, and VPC settings without touching code

Example:

Deploy customer-churn solution for 3 companies × 2 regions × 3 environments = 18 isolated buckets with one configuration template.

What You Get

Battle-Tested ML Folder Structure

Saves weeks of infrastructure design time - Instead of designing folder hierarchies from scratch, get a comprehensive, production-ready structure that covers:

  • Complete ML Pipeline: Raw → Curated → Processed → Inference data flow

  • Universal Applicability: Works for any ML domain (computer vision, NLP, time series, recommendation systems, fraud detection, etc.)

  • 130+ Organized Folders: Data, models, notebooks, artifacts, code, configs, and monitoring

  • Enterprise-Ready: Built-in support for governance, compliance, audit trails, and data lineage

  • Fully Customizable: Use as-is or adapt to your specific needs - remove unused folders or add custom ones

See S3_FOLDERS.md for complete folder structure reference.

Bonus: Enterprise Governance Blueprint

Beyond infrastructure provisioning - Get a complete reference architecture for implementing governance, compliance, and audit capabilities:

  • Ready-to-use JSON schemas for audit logs, data lineage, and compliance metadata

  • Multi-framework compliance support (GDPR, HIPAA, SOC 2, ISO 27001, CCPA)

  • RBAC examples with role-based access patterns

  • Query templates for audit trail analysis

  • Implementation checklist with AWS service recommendations

See GOVERNANCE_COMPLIANCE.md for the complete governance framework.

ML-Optimized Folder Structure

solutions/
  customer-churn/
    data/
      raw/              # Ingested data with date partitioning
      curated/          # Cleaned and validated data
      processed/        # Feature-engineered training data
        train/
        validation/
        test/
        feature_engineering/
      inference/        # Batch and realtime predictions
    models/
      experiments/      # Experiment tracking
      training/         # Trained models by algorithm
      evaluation/       # Model comparison and monitoring
      registry/         # Production/staging/dev model versions
    notebooks/          # Jupyter notebooks by phase
    artifacts/          # Logs, checkpoints, visualizations, reports
    code/               # Pipeline code and tests
    config/             # Environment and model configurations

Automated Lifecycle Policies

4 pre-configured profiles for cost optimization:

  • ml-optimized: 30d→IA, 90d→GLACIER (60-70% cost savings)

  • compliance: 90d→GLACIER, 7-year retention (70-80% savings)

  • development: 90-day expiration (100% savings)

  • none: Manual management

Infrastructure as Code

  • CloudFormation-based deployment with stack outputs for multi-stack orchestration

  • One-command cleanup via CloudFormation stack deletion

  • 67 leaf folders created via CloudFormation templates

  • Lambda-based folder creation for remaining structure

  • IAM policies auto-generated

  • VPC endpoint support

  • Automated tagging (7 system + custom tags)

  • Local template validation (YAML syntax, structure, reference integrity)

  • Infrastructure drift detection against deployed stacks

  • Change preview via CloudFormation ChangeSets

  • Safe test deployments with isolated resource names

  • Built-in cost estimation with region-specific pricing

Quick Start

1. Create Configuration

configs/my-ml-project.yaml:

client:
  company_name: "Acme Corp"
  company_prefix: "acme"
  account_id: "123456789012"
  tenant_id: "a001"

environment:
  env: "prod"
  region: "us-west-1"

s3:
  bucket_name_override: ""
  versioning: true
  lifecycle_policy: "ml-optimized"
  vpc_id: ""
  route_table_ids: ""
  tags:
    Project: "Customer Churn ML"
    Owner: "data-science-team"

2. Deploy Master Solution

docker run --rm \
  -e AWS_PROFILE=default \
  -v ~/.aws:/home/s3user/.aws:ro \
  -v $(pwd)/s3/configs:/app/configs \
  -v $(pwd)/s3/reports:/app/reports \
  s3-provisioner:latest \
  --config my-ml-project.yaml \
  --action prep-master \
  --solution master-solution \
  --force

Result: Bucket acme-prod-a001-us-west-1-s3 created with complete ML folder structure.

3. Deploy Additional Solutions

# Deploy customer churn solution
docker run --rm \
  -e AWS_PROFILE=default \
  -v ~/.aws:/home/s3user/.aws:ro \
  -v $(pwd)/s3/configs:/app/configs \
  -v $(pwd)/s3/reports:/app/reports \
  s3-provisioner:latest \
  --config my-ml-project.yaml \
  --action deploy-solution \
  --solution customer-churn

Available Solutions: customer-churn, demand-forecasting, fraud-detection

Common Workflows

Pattern A: Shared Bucket (Multiple Solutions)

# 1. Create master structure
--action prep-master --solution master-solution --force

# 2. Deploy solutions
--action deploy-solution --solution customer-churn
--action deploy-solution --solution fraud-detection

Result: acme-prod-a001-us-west-1-s3/solutions/{customer-churn,fraud-detection}/

Pattern B: Dedicated Buckets (One Solution Per Bucket)

# customer-churn-config.yaml
s3:
  bucket_name_override: "acme-prod-a001-us-west-1-customer-churn"
--action prep-master --solution customer-churn --force

Result: acme-prod-a001-us-west-1-customer-churn/solutions/customer-churn/

Key Features

Configuration Validation

docker run --rm \
  -v ~/.aws:/home/s3user/.aws:ro \
  -v $(pwd)/s3/configs:/app/configs \
  s3-provisioner:latest \
  --config my-ml-project.yaml \
  --action validate-config

IAM Policy Generation

docker run --rm \
  -v ~/.aws:/home/s3user/.aws:ro \
  -v $(pwd)/s3/configs:/app/configs \
  -v $(pwd)/s3/policies:/app/policies \
  s3-provisioner:latest \
  --config my-ml-project.yaml \
  --action create-policy

Template Validation

docker run --rm \
  -v ~/.aws:/home/s3user/.aws:ro \
  -v $(pwd)/s3/configs:/app/configs:ro \
  -v $(pwd)/s3/templates:/app/templates \
  -v $(pwd)/s3/reports:/app/reports \
  s3-provisioner:latest \
  --config my-ml-project.yaml \
  --action validate-prov-template \
  --solution master-solution

Change Preview

docker run --rm \
  -v ~/.aws:/home/s3user/.aws:ro \
  -v $(pwd)/s3/configs:/app/configs:ro \
  -v $(pwd)/s3/templates:/app/templates \
  -v $(pwd)/s3/reports:/app/reports \
  s3-provisioner:latest \
  --config my-ml-project.yaml \
  --action show-changes \
  --solution master-solution

Drift Detection

docker run --rm \
  -v ~/.aws:/home/s3user/.aws:ro \
  -v $(pwd)/s3/configs:/app/configs:ro \
  -v $(pwd)/s3/reports:/app/reports \
  s3-provisioner:latest \
  --config my-ml-project.yaml \
  --action check-drift

Infrastructure Cleanup

docker run --rm \
  -e AWS_PROFILE=default \
  -v ~/.aws:/home/s3user/.aws:ro \
  -v $(pwd)/s3/configs:/app/configs \
  -v $(pwd)/s3/reports:/app/reports \
  s3-provisioner:latest \
  --config my-ml-project.yaml \
  --action tear-down \
  --force

Generate Usage Assumptions

docker run --rm \
  -v ~/.aws:/home/s3user/.aws:ro \
  -v $(pwd)/s3/configs:/app/configs \
  -v $(pwd)/s3/templates:/app/templates \
  -v $(pwd)/s3/reports:/app/reports \
  s3-provisioner:latest \
  --config my-ml-project.yaml \
  --action cost-traffic \
  --solution master-solution

Estimate Monthly Costs

docker run --rm \
  -v ~/.aws:/home/s3user/.aws:ro \
  -v $(pwd)/s3/configs:/app/configs \
  -v $(pwd)/s3/templates:/app/templates \
  -v $(pwd)/s3/reports:/app/reports \
  s3-provisioner:latest \
  --config my-ml-project.yaml \
  --action cost-estimate \
  --solution master-solution

Refresh Resource Pricing

docker run --rm \
  -v ~/.aws:/home/s3user/.aws:ro \
  -v $(pwd)/s3/configs:/app/configs \
  -v $(pwd)/s3/templates:/app/templates \
  -v $(pwd)/s3/reports:/app/reports \
  s3-provisioner:latest \
  --config my-ml-project.yaml \
  --action cost-refresh-prices \
  --solution master-solution

AWS Credentials

Option 1: AWS Profile (Recommended)

-e AWS_PROFILE=default \
-v ~/.aws:/home/s3user/.aws:ro

Option 2: Environment Variables

-e AWS_ACCESS_KEY_ID=<key> \
-e AWS_SECRET_ACCESS_KEY=<secret> \
-e AWS_DEFAULT_REGION=us-west-1

Option 3: IAM Role (when running on EC2/ECS)

# No credentials needed - uses instance role

Documentation

All documentation is embedded in the Docker image:

# Copy all docs to local directory
docker run --rm \
  -v $(pwd)/s3/docs:/output \
  --entrypoint cp \
  s3-provisioner:latest \
  -r /app/docs/. /output/

Available Guides:

  • S3_FOLDERS.md - Complete folder structure reference

  • GOVERNANCE_COMPLIANCE.md - Enterprise governance reference architecture

  • USER_GUIDE.md - Complete command reference with 19 actions

  • CONFIGURATION.md - Configuration parameters and examples

  • IAM_PERMISSIONS.md - Required AWS permissions

  • ML_LIFECYCLE_POLICIES.md - Lifecycle policy details

  • COST_OPTIMIZATION.md - Cost optimization and estimation

  • TROUBLESHOOTING.md - Common issues and solutions

  • RELEASE_NOTES.md - Version history and features

System Requirements

  • Docker 20.10+

  • AWS account with S3 and CloudFormation permissions

  • 512 MB RAM minimum

  • 1 GB disk space

Support

See SUPPORT.md for assistance.

License

Commercial license via AWS Marketplace subscription.


Copyright © 2025 Axon Tech Labs All rights reserved.

See LICENSE.txt for terms and conditions.