S3 Folder Structure ReferenceΒΆ
Complete technical reference for the S3 folder structure provisioned by this tool.
Table of ContentsΒΆ
Key Design PrinciplesΒΆ
Principle |
Description |
|---|---|
Date Partitioning |
Uses |
Naming Convention |
Underscores for code folders, hyphens for non-code |
Feature Store |
Evolving model with feature stores in each data split |
Environment Strategy |
3-environment model (dev β staging β prod) |
Multi-Tenant |
Company-agnostic design with client configuration support |
Complete Pipeline |
Raw β Curated β Processed β Inference data flow |
Comprehensive Logging |
Daily partitioned logs with structured JSON format |
Enterprise Standards |
Folder structure designed to support governance, compliance, and audit workflows |
Customization: This structure provides a comprehensive starting point. Clients can remove unused folders or add custom folders to match their specific ML workflows and organizational requirements.
Bucket Naming ConventionΒΆ
When naming Amazon S3 buckets, it is critical to remember that names must be globally unique across all AWS accounts and regions. Effective naming often includes identifying information like your company name, the environment, and the purpose of the bucket to avoid collisions.
The name of a bucket can be auto-generated by s3-provisioner application or provided by a user (e.g. examplecompany-stage-uswest2-54321-prod).
When auto-generated then it is constructed from four parts:
{company_prefix}-{environment}-{tenant_id}-{region}
Here is the explanation of these parts:
Field |
Description |
Example |
|---|---|---|
|
Short company identifier |
βedgeβ |
|
AWS tenant id/ID |
βa001β |
|
Environment (prod/dev/test) |
βprodβ |
|
AWS region |
βus-west-2β |
|
Custom bucket name or ββ |
βedge-overriden-bucketβ |
Examples:
- edge-dev-a001-us-west-1
- edge-dev-a001-us-west-2 -- different AWS region
- edge-prod-a001-us-west-2-s3 -- different environment
- edge-prod-b001-us-west-2 -- different tenant
- techcorp-prod-b001-us-west-2 -- different company1
Folder Structure (solutions only)ΒΆ
s3://{company_prefix}-{environment}-{tenant_id}-{region}/
βββ solutions/
βββ customer-churn/
βββ demand-forecasting/
βββ fraud-detection/
βββ master-solution/
βββ recommendation-engine/
βββ sentiment-analysis/
Folder Structure (master-solution one level down)ΒΆ
s3://{company_prefix}-{environment}-{tenant_id}-{region}/
βββ solutions/
βββ master-solution/
βββ artifacts/
βββ code/
βββ config/
βββ data/
βββ models/
βββ notebooks/
βββ templates/
Folder Structure (master-solution two levels down)ΒΆ
s3://{company_prefix}-{environment}-{tenant_id}-{region}/
βββ solutions/
βββ master-solution/
βββ artifacts/
β βββ checkpoints/
β βββ logs/
β βββ metadata/
β βββ reports/
β βββ sagemaker-extensions/
β βββ visualizations/
βββ code/
β βββ feature_engineering/
β βββ inference/
β βββ monitoring/
β βββ pipelines/
β βββ preprocessing/
β βββ training/
β βββ utils/
βββ config/
β βββ environment_configs/
β βββ model_configs/
βββ data/
β βββ curated/
β βββ inference/
β βββ processed/
β βββ raw/
βββ models/
β βββ evaluation/
β βββ experiments/
β βββ registry/
β βββ training/
β βββ tuning/
βββ notebooks/
β βββ evaluation/
β βββ exploration/
β βββ inference/
β βββ preprocessing/
β βββ training/
βββ templates/
βββ service-catalog/
Compacted Folder Structure (master-solution all nodes, folders only)ΒΆ
s3://{company_prefix}-{environment}-{tenant_id}-{region}/
βββ solutions/
βββ master-solution/
βββ artifacts/
β βββ checkpoints/
β β βββ preprocessing_checkpoints/
β β βββ training_checkpoints/
β β βββ xgboost/
β β βββ random_forest/
β βββ logs/
β β βββ feature_engineering/
β β β βββ daily_logs/
β β βββ inference/
β β β βββ batch_inference/
β β β βββ realtime_inference/
β β βββ preprocessing/
β β β βββ daily_logs/
β β β βββ 2024/
β β β βββ 01/
β β β βββ 01/
β β βββ training/
β β βββ hyperparameter_tuning_logs/
β β βββ training_job_logs/
β βββ metadata/
β β βββ deployment_metadata/
β β βββ governance/
β β βββ preprocessing_metadata/
β β βββ training_metadata/
β βββ reports/
β β βββ data_quality/
β β β βββ daily_quality_reports/
β β β βββ 2024/
β β β βββ 01/
β β β βββ 01/
β β βββ feature_engineering/
β β βββ model_evaluation/
β β βββ model_training/
β β βββ monitoring/
β β βββ validation/
β βββ sagemaker-extensions/
β βββ visualizations/
β βββ data_exploration/
β βββ feature_analysis/
β βββ model_performance/
β βββ monitoring/
βββ code/
β βββ feature_engineering/
β β βββ tests/
β βββ inference/
β β βββ tests/
β βββ monitoring/
β β βββ tests/
β βββ pipelines/
β β βββ tests/
β βββ preprocessing/
β β βββ tests/
β βββ training/
β β βββ tests/
β βββ utils/
β βββ tests/
βββ config/
β βββ environment_configs/
β βββ model_configs/
βββ data/
β βββ curated/
β β βββ 2024/
β β β βββ 01/
β β β βββ 01/
β β βββ consolidated/
β β βββ weekly/
β β βββ monthly/
β βββ inference/
β β βββ batch/
β β β βββ input/
β β β βββ output/
β β βββ realtime/
β β βββ requests/
β β β βββ 2024/
β β β βββ 01/
β β β βββ 01/
β β βββ responses/
β β βββ 2024/
β β βββ 01/
β β βββ 01/
β βββ processed/
β β βββ train/
β β β βββ feature_store/
β β βββ validation/
β β β βββ feature_store/
β β βββ test/
β β β βββ feature_store/
β β βββ feature_engineering/
β β βββ encoders/
β β βββ feature_definitions/
β β βββ statistics/
β βββ raw/
β βββ 2024
β β βββ 01/
β β βββ 01/
β βββ archive/
βββ models/
β βββ evaluation/
β β βββ model_comparison/
β β β βββ performance_charts/
β β βββ validation_results/
β β βββ monitoring/
β βββ experiments/
β β βββ experiment_001/
β β β βββ artifacts/
β β βββ experiment_002/
β βββ registry/
β β βββ production/
β β β βββ model_v1.0.0/
β β βββ staging/
β β βββ development/
β βββ training/
β β βββ xgboost/
β β βββ random_forest/
β β βββ neural_network/
β βββ tuning/
β βββ tuning_job_001/
β βββ best_training_job/
β βββ all_training_jobs/
βββ notebooks/
β βββ evaluation/
β βββ exploration/
β βββ inference/
β βββ preprocessing/
β βββ training/
βββ templates/
βββ service-catalog/
Complete Folder Structure (folders and example files)ΒΆ
This section shows the complete folder structure with example files for any ML solution. The example uses customer-churn-prediction as a representative use case, but this structure applies to:
Computer vision (image classification, object detection, segmentation)
Natural language processing (sentiment analysis, text classification, NER)
Time series forecasting (demand prediction, anomaly detection)
Recommendation systems
Fraud detection
Any supervised/unsupervised ML workflow
The bottom 3 folders (shared/, client_config/) are optional organizational folders not provisioned by default.
s3://{company_prefix}-{environment}-{tenant_id}-{region}/
βββ solutions/
β βββ customer-churn-prediction/
β βββ artifacts/
β β βββ checkpoints/
β β β βββ preprocessing_checkpoints/
β β β βββ training_checkpoints/
β β β βββ xgboost/
β β β β βββ checkpoint_epoch_10.pkl
β β β β βββ checkpoint_epoch_20.pkl
β β β β βββ checkpoint_final.pkl
β β β βββ random_forest/
β β βββ logs/
β β β βββ feature_engineering/
β β β β βββ feature_engineering_pipeline.log
β β β β βββ categorical_features.log
β β β β βββ daily_logs/
β β β βββ inference/
β β β β βββ batch_inference/
β β β β βββ realtime_inference/
β β β βββ preprocessing/
β β β β βββ preprocessing_pipeline.log
β β β β βββ data_ingestion.log
β β β β βββ data_validation.log
β β β β βββ data_cleaning.log
β β β β βββ daily_logs/
β β β β βββ 2024/
β β β β βββ 01/
β β β β βββ 01/
β β β β βββ 01/preprocessing_001.json
β β β β βββ 01/preprocessing_001.json
β β β βββ training/
β β β βββ hyperparameter_tuning_logs/
β β β βββ training_job_logs/
β β β βββ xgboost_training.log
β β β βββ random_forest_training.log
β β βββ metadata/
β β β βββ deployment_metadata/
β β β β βββ endpoint_configurations.json
β β β β βββ model_deployment_history.json
β β β βββ governance/
β β β β βββ data_governance_policies.json
β β β β βββ audit_trail.json
β β β βββ preprocessing_metadata/
β β β β βββ cleaning_summary.json
β β β β βββ transformation_summary.json
β β β β βββ data_lineage.json
β β β βββ training_metadata/
β β β βββ experiment_tracking.json
β β β βββ model_versioning.json
β β β βββ hyperparameter_history.json
β β βββ reports/
β β β βββ data_quality/
β β β β βββ raw_data_quality_report.html
β β β β βββ curated_data_quality_report.html
β β β β βββ daily_quality_reports/
β β β β βββ 2024
β β β β βββ 01/
β β β β βββ 01/
β β β β βββ customers_quality.html
β β β β βββ transactions_quality.html
β β β β βββ usage_metrics_quality.html
β β β βββ feature_engineering/
β β β β βββ feature_correlation_matrix.html
β β β β βββ feature_importance_report.html
β β β β βββ feature_engineering_summary.html
β β β βββ model_evaluation/
β β β β βββ performance_evaluation_report.html
β β β β βββ bias_fairness_report.html
β β β β βββ model_interpretability_report.html
β β β βββ model_training/
β β β β βββ training_summary_report.html
β β β β βββ hyperparameter_tuning_report.html
β β β β βββ model_comparison_report.html
β β β βββ monitoring/
β β β β βββ model_monitoring_dashboard.html
β β β β βββ data_drift_report.html
β β β βββ validation/
β β β βββ validation_summary.json
β β β βββ schema_validation_report.html
β β β βββ data_quality_validation.html
β β βββ sagemaker-extensions/
β β βββ visualizations/
β β βββ data_exploration/
β β β βββ customer_demographics.png
β β β βββ transaction_distributions.png
β β β βββ usage_patterns.png
β β βββ feature_analysis/
β β β βββ feature_importance_plots.png
β β β βββ correlation_heatmaps.png
β β β βββ shap_analysis.png
β β βββ model_performance/
β β β βββ roc_curves.png
β β β βββ precision_recall_curves.png
β β β βββ confusion_matrices.png
β β βββ monitoring/
β βββ code/
β β βββ feature_engineering/
β β β βββ feature_engineering_pipeline.py
β β β βββ categorical_features.py
β β β βββ numerical_features.py
β β β βββ feature_selection.py
β β β βββ feature_validation.py
β β β βββ tests/
β β βββ inference/
β β β βββ batch_inference.py
β β β βββ realtime_inference.py
β β β βββ model_serving.py
β β β βββ tests/
β β βββ monitoring/
β β β βββ model_drift_detection.py
β β β βββ data_quality_monitoring.py
β β β βββ performance_monitoring.py
β β β βββ tests/
β β βββ pipelines/
β β β βββ training_pipeline.py
β β β βββ inference_pipeline.py
β β β βββ monitoring_pipeline.py
β β β βββ tests/
β β βββ preprocessing/
β β β βββ s3_event_handler.py
β β β βββ preprocessing_pipeline.py
β β β βββ data_ingestion.py
β β β βββ data_validation.py
β β β βββ data_cleaning.py
β β β βββ data_transformation.py
β β β βββ data_profiler.py
β β β βββ tests/
β β β βββ test_data_ingestion.py
β β β βββ test_data_validation.py
β β β βββ test_preprocessing_pipeline.py
β β βββ training/
β β β βββ train_xgboost.py
β β β βββ train_random_forest.py
β β β βββ hyperparameter_tuning.py
β β β βββ model_evaluation.py
β β β βββ tests/
β β βββ utils/
β β βββ common_utils.py
β β βββ aws_utils.py
β β βββ data_utils.py
β β βββ tests/
β βββ config/
β β βββ environment_configs/
β β β βββ development.yaml
β β β βββ staging.yaml
β β β βββ production.yaml
β β βββ model_configs/
β β β βββ xgboost_config.yaml
β β β βββ random_forest_config.yaml
β β β βββ neural_network_config.yaml
β β βββ preprocessing_config.yaml
β β βββ feature_engineering_config.yaml
β β βββ training_config.yaml
β β βββ inference_config.yaml
β β βββ monitoring_config.yaml
β βββ data/
β β βββ curated/
β β β βββ 2024
β β β β βββ 01/
β β β β βββ 01/
β β β β βββ customers_cleaned_20240101.parquet
β β β β βββ transactions_cleaned_20240101.parquet
β β β β βββ support_tickets_cleaned_20240101.parquet
β β β β βββ usage_metrics_cleaned_20240101.parquet
β β β βββ consolidated/
β β β βββ weekly/
β β β β βββ customers_week_01_2024.parquet
β β β βββ monthly/
β β β βββ customers_jan_2024.parquet
β β βββ inference/
β β β βββ batch/
β β β β βββ input/
β β β β β βββ batch_20240101.parquet
β β β β β βββ batch_20240102.parquet
β β β β βββ output/
β β β β βββ predictions_20240101.parquet
β β β β βββ predictions_20240102.parquet
β β β βββ realtime/
β β β βββ requests/
β β β β βββ 2024
β β β β β βββ 01/
β β β β β βββ 01/
β β β β βββ 2024
β β β β βββ 01/
β β β β βββ 02/
β β β βββ responses/
β β β βββ 2024
β β β β βββ 01/
β β β β βββ 01/
β β β βββ 2024
β β β βββ 01/
β β β βββ 02/
β β βββ processed/
β β β βββ train/
β β β β βββ features_train.parquet
β β β β βββ labels_train.parquet
β β β β βββ metadata_train.json
β β β β βββ feature_store/
β β β β βββ customer_features.parquet
β β β β βββ transaction_features.parquet
β β β β βββ support_features.parquet
β β β β βββ usage_features.parquet
β β β βββ validation/
β β β β βββ features_validation.parquet
β β β β βββ labels_validation.parquet
β β β β βββ metadata_validation.json
β β β β βββ feature_store/
β β β βββ test/
β β β β βββ features_test.parquet
β β β β βββ labels_test.parquet
β β β β βββ metadata_test.json
β β β β βββ feature_store/
β β β βββ feature_engineering/
β β β βββ encoders/
β β β β βββ categorical_encoder.pkl
β β β β βββ numerical_scaler.pkl
β β β β βββ feature_selector.pkl
β β β βββ feature_definitions/
β β β β βββ feature_schema.json
β β β β βββ feature_catalog.json
β β β β βββ feature_lineage.json
β β β βββ statistics/
β β β βββ feature_stats.json
β β β βββ correlation_matrix.json
β β β βββ importance_scores.json
β β βββ raw/
β β βββ 2024
β β β βββ 01/
β β β βββ 01/
β β β βββ customers_20240101.csv
β β β βββ transactions_20240101.csv
β β β βββ support_tickets_20240101.json
β β β βββ usage_metrics_20240101.parquet
β β βββ archive/
β βββ models/
β β βββ evaluation/
β β β βββ model_comparison/
β β β β βββ comparison_report.html
β β β β βββ metrics_comparison.json
β β β β βββ performance_charts/
β β β β βββ roc_curves.png
β β β β βββ precision_recall.png
β β β β βββ feature_importance.png
β β β βββ validation_results/
β β β βββ monitoring/
β β βββ experiments/
β β β βββ experiment_001/
β β β β βββ config.json
β β β β βββ metrics.json
β β β β βββ parameters.json
β β β β βββ artifacts/
β β β β βββ model.pkl
β β β β βββ feature_importance.json
β β β β βββ confusion_matrix.png
β β β βββ experiment_002/
β β βββ registry/
β β β βββ production/
β β β β βββ model_v1.0.0/
β β β β βββ model_package.json
β β β β βββ approval_status.json
β β β β βββ deployment_config.json
β β β βββ staging/
β β β βββ development/
β β βββ training/
β β β βββ xgboost/
β β β β βββ model.tar.gz
β β β β βββ model_metadata.json
β β β β βββ training_job_config.json
β β β βββ random_forest/
β β β βββ neural_network/
β β βββ tuning/
β β βββ tuning_job_001/
β β βββ best_training_job/
β β β βββ model.tar.gz
β β β βββ hyperparameters.json
β β βββ all_training_jobs/
β β βββ tuning_results.json
β βββ notebooks/
β β βββ evaluation/
β β β βββ model_performance_analysis.ipynb
β β β βββ bias_fairness_evaluation.ipynb
β β β βββ model_interpretability.ipynb
β β βββ exploration/
β β β βββ customer_analysis.ipynb
β β β βββ transaction_patterns.ipynb
β β β βββ support_ticket_analysis.ipynb
β β β βββ churn_pattern_discovery.ipynb
β β βββ inference/
β β β βββ batch_inference_testing.ipynb
β β β βββ realtime_inference_testing.ipynb
β β βββ preprocessing/
β β β βββ data_quality_assessment.ipynb
β β β βββ data_cleaning_validation.ipynb
β β β βββ preprocessing_pipeline_validation.ipynb
β β βββ training/
β β βββ baseline_model_training.ipynb
β β βββ hyperparameter_tuning.ipynb
β β βββ ensemble_model_training.ipynb
β βββ templates/
β βββ service-catalog/
βββ shared/
β βββ infrastructure/
β βββ monitoring/
β βββ utilities/
βββ client_config/
βββ environments/
βββ branding/
βββ policies/
Copyright Β© 2025 Axon Tech Labs All rights reserved.
See LICENSE.txt for terms and conditions.