TroubleshootingΒΆ
Quick DiagnosticsΒΆ
Table of ContentsΒΆ
# Check AWS credentials
aws sts get-caller-identity
# Check Docker version
docker --version
# Test AWS VPC access
aws ec2 describe-vpcs --region us-west-2
# Validate configuration
docker run --rm \
-v $(pwd)/vpc/configs:/app/configs:ro \
-v $(pwd)/vpc/reports:/app/reports \
vpc-provisioner:latest \
--config edge-prod-a001-us-west-2-vpc.yaml \
--action validate-config
Common PitfallsΒΆ
1. Not quoting account_id in YAML
β
account_id: 123456789012(loses leading zeros)β
account_id: "123456789012"(preserves format)
2. Skipping configuration validation
Always run
validate-configbefore AWS operationsCatches most errors before deployment β saves time and avoids failed stacks
3. Not reviewing generated templates
Review CloudFormation templates before
create-vpcUse
validate-prov-templateto catch reference errors locallyPrevents unexpected resource creation
4. Using access keys in production
β Environment variables with long-lived access keys
β IAM roles (EC2/ECS) or AWS profiles with MFA
Access keys are visible in process lists and Docker history
5. Overlapping CIDR blocks
Plan CIDR allocation before deployment
Document CIDR usage in a central registry
Ensure new VPC CIDRs donβt overlap with existing VPCs or on-premises networks
Common ErrorsΒΆ
AWS CredentialsΒΆ
Error: Unable to locate credentialsΒΆ
Error: Unable to locate credentials. You can configure credentials by running "aws configure".
Solution:
# Option 1: Environment variables
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
export AWS_DEFAULT_REGION=us-west-2
# Option 2: AWS profile
export AWS_PROFILE=default
# Verify
aws sts get-caller-identity
Error: The security token included in the request is invalidΒΆ
Error: The security token included in the request is invalid
Causes:
Expired temporary credentials
Invalid access key
Credentials from different account
Solution:
# Refresh credentials
aws sts get-session-token
# Or use long-term credentials
cat ~/.aws/credentials
# Verify current identity
aws sts get-caller-identity
Error: Access DeniedΒΆ
Error: An error occurred (UnauthorizedOperation) when calling the CreateVpc operation
Solution:
# Check current user
aws sts get-caller-identity
# Generate IAM policy
docker run --rm \
-v $(pwd)/vpc/configs:/app/configs:ro \
-v $(pwd)/vpc/policies:/app/policies \
-v $(pwd)/vpc/reports:/app/reports \
vpc-provisioner:latest \
--config edge-prod-a001-us-west-2-vpc.yaml \
--action create-policy
# Review and attach generated policy
cat policies/edge-prod-a001-us-west-2-vpc-iam-policy.json
# See IAM_PERMISSIONS.md for required permissions
Configuration ErrorsΒΆ
Error: Invalid CIDR blockΒΆ
Error: Invalid CIDR block format
Solution:
# Valid
vpc:
cidr_block: 10.0.0.0/16
# Invalid
vpc:
cidr_block: 10.0.0.0/8 # Too large
cidr_block: 10.0.0.0/29 # Too small
Error: Subnet CIDR not within VPC CIDRΒΆ
Error: Subnet CIDR must be within VPC CIDR range
Solution:
# VPC: 10.0.0.0/16
# Valid subnet
vpc:
subnets:
public:
- cidr: 10.0.1.0/24 # Within VPC CIDR
# Invalid subnet
vpc:
subnets:
public:
- cidr: 192.168.1.0/24 # Outside VPC CIDR
Error: Overlapping subnet CIDRsΒΆ
Error: Subnet CIDRs cannot overlap
Solution:
# Wrong
vpc:
subnets:
public:
- cidr: 10.0.1.0/24
private:
- cidr: 10.0.1.0/24 # Overlaps!
# Correct
vpc:
subnets:
public:
- cidr: 10.0.1.0/24
private:
- cidr: 10.0.11.0/24 # No overlap
Error: Configuration validation failedΒΆ
Error: Configuration validation failed: 'client' is a required property
Solution: Ensure configuration has all three required sections:
client:
company_name: Edge Corp
company_prefix: edge
account_id: "123456789012"
tenant_id: "a001"
environment:
env: prod
region: us-west-2
vpc:
vpc_name_override: ""
cidr_block: 10.0.0.0/16
availability_zones:
- us-west-2a
subnets:
public:
- name: public-subnet-1
cidr: 10.0.1.0/24
az: us-west-2a
internet_gateway:
enabled: true
nat_gateway:
enabled: false
Error: Invalid availability zoneΒΆ
Error: Availability zone not in list
Solution:
# Wrong
vpc:
availability_zones:
- us-west-2a
subnets:
public:
- az: us-west-2c # Not in AZ list!
# Correct
vpc:
availability_zones:
- us-west-2a
- us-west-2c
subnets:
public:
- az: us-west-2c # In AZ list
Error: Account ID not quotedΒΆ
Error: Account ID lost leading zeros
Solution:
# Wrong (loses leading zeros)
client:
account_id: 123456789012
# Correct (preserves leading zeros)
client:
account_id: "123456789012"
CloudFormation ErrorsΒΆ
Error: Stack already existsΒΆ
Error: Stack [edge-prod-a001-us-west-2-vpc-stack] already exists
Solution:
# Option 1: Delete existing stack first
docker run --rm \
-v ~/.aws:/home/vpcuser/.aws:ro \
-v $(pwd)/vpc/configs:/app/configs:ro \
-v $(pwd)/vpc/reports:/app/reports \
vpc-provisioner:latest \
--config edge-prod-a001-us-west-2-vpc.yaml \
--action delete-vpc \
--force
# Option 2: Use different VPC name
vpc:
vpc_name_override: "my-vpc-v2"
Error: CloudFormation rollbackΒΆ
Error: Stack creation failed and rolled back
Solution:
# Check CloudFormation events
aws cloudformation describe-stack-events \
--stack-name edge-prod-a001-us-west-2-vpc-stack \
--max-items 20
# Common causes:
# 1. Insufficient IAM permissions
# 2. Invalid CIDR blocks
# 3. Overlapping subnets
# 4. Invalid availability zones
# 5. NAT Gateway without Internet Gateway
# Review logs
cat reports/*.log
Error: VPC limit exceededΒΆ
Error: VpcLimitExceeded: The maximum number of VPCs has been reached
Solution:
# Check VPC limit
aws ec2 describe-account-attributes \
--attribute-names max-vpcs
# List existing VPCs
aws ec2 describe-vpcs --region us-west-2
# Delete unused VPCs or request limit increase
aws service-quotas request-service-quota-increase \
--service-code vpc \
--quota-code L-F678F1CE \
--desired-value 10
Template Validation ErrorsΒΆ
Error: Template file not foundΒΆ
Error: Template file not found: templates/edge-prod-a001-us-west-2-vpc-template.yaml
Solution: Generate the template first, or let validate-prov-template auto-generate it:
docker run --rm \
-v $(pwd)/vpc/configs:/app/configs:ro \
-v $(pwd)/vpc/templates:/app/templates \
-v $(pwd)/vpc/reports:/app/reports \
vpc-provisioner:latest \
--config edge-prod-a001-us-west-2-vpc.yaml \
--action validate-prov-template
Note: If the template does not exist, it will be generated automatically before validation.
Error: Invalid !Ref targetΒΆ
Error: !Ref target 'InvalidResource' not found in Resources or Parameters
Causes:
Template references a resource that doesnβt exist
Typo in resource name
Template was manually edited
Solution: Regenerate the template from configuration:
docker run --rm \
-v $(pwd)/vpc/configs:/app/configs:ro \
-v $(pwd)/vpc/templates:/app/templates \
-v $(pwd)/vpc/reports:/app/reports \
vpc-provisioner:latest \
--config edge-prod-a001-us-west-2-vpc.yaml \
--action create-prov-template
Change Preview ErrorsΒΆ
Error: Stack does not exist for show-changesΒΆ
Error: Stack [edge-prod-a001-us-west-2-vpc-stack] does not exist
Cause: show-changes requires a deployed stack to compare against.
Solution: Deploy the stack first, then preview changes:
# Deploy first
docker run --rm \
-v ~/.aws:/home/vpcuser/.aws:ro \
-v $(pwd)/vpc/configs:/app/configs:ro \
-v $(pwd)/vpc/templates:/app/templates \
-v $(pwd)/vpc/reports:/app/reports \
vpc-provisioner:latest \
--config edge-prod-a001-us-west-2-vpc.yaml \
--action create-vpc \
--force
# Then preview changes
docker run --rm \
-v ~/.aws:/home/vpcuser/.aws:ro \
-v $(pwd)/vpc/configs:/app/configs:ro \
-v $(pwd)/vpc/templates:/app/templates \
-v $(pwd)/vpc/reports:/app/reports \
vpc-provisioner:latest \
--config edge-prod-a001-us-west-2-vpc.yaml \
--action show-changes
Error: No changes detectedΒΆ
Symptoms: show-changes reports no pending changes.
Cause: The deployed stack matches the current template.
Solution: This is expected when no configuration changes have been made. Modify your configuration or template, then re-run show-changes.
Drift Detection ErrorsΒΆ
Error: Stack does not exist for check-driftΒΆ
Error: Stack [edge-prod-a001-us-west-2-vpc-stack] does not exist
Cause: check-drift requires a deployed stack to detect drift against.
Solution: Deploy the stack first:
docker run --rm \
-v ~/.aws:/home/vpcuser/.aws:ro \
-v $(pwd)/vpc/configs:/app/configs:ro \
-v $(pwd)/vpc/templates:/app/templates \
-v $(pwd)/vpc/reports:/app/reports \
vpc-provisioner:latest \
--config edge-prod-a001-us-west-2-vpc.yaml \
--action create-vpc \
--force
Error: Drift detection timeoutΒΆ
Symptoms: Drift detection takes too long or times out.
Causes:
Large number of resources in the stack (subnets, NAT gateways, route tables)
AWS API throttling
Solution:
# Check drift detection status manually
aws cloudformation describe-stack-drift-detection-status \
--stack-drift-detection-id <detection-id>
# Retry
docker run --rm \
-v ~/.aws:/home/vpcuser/.aws:ro \
-v $(pwd)/vpc/configs:/app/configs:ro \
-v $(pwd)/vpc/reports:/app/reports \
vpc-provisioner:latest \
--config edge-prod-a001-us-west-2-vpc.yaml \
--action check-drift
NAT Gateway ErrorsΒΆ
Error: NAT Gateway requires Internet GatewayΒΆ
Error: NAT Gateway requires Internet Gateway to be enabled
Solution:
# Wrong
vpc:
internet_gateway:
enabled: false
nat_gateway:
enabled: true # Requires IGW!
# Correct
vpc:
internet_gateway:
enabled: true
nat_gateway:
enabled: true
Error: NAT Gateway requires public subnetΒΆ
Error: NAT Gateway requires at least one public subnet
Solution:
# Wrong
vpc:
subnets:
private:
- name: private-subnet-1
cidr: 10.0.11.0/24
az: us-west-2a
nat_gateway:
enabled: true # No public subnet!
# Correct
vpc:
subnets:
public:
- name: public-subnet-1
cidr: 10.0.1.0/24
az: us-west-2a
private:
- name: private-subnet-1
cidr: 10.0.11.0/24
az: us-west-2a
nat_gateway:
enabled: true
Error: Elastic IP limit exceededΒΆ
Error: AddressLimitExceeded: The maximum number of addresses has been reached
Solution:
# Check Elastic IP limit
aws ec2 describe-account-attributes \
--attribute-names max-elastic-ips
# List existing Elastic IPs
aws ec2 describe-addresses --region us-west-2
# Release unused Elastic IPs
aws ec2 release-address --allocation-id eipalloc-xxxxx
# Or request limit increase
Docker ErrorsΒΆ
Error: Cannot connect to Docker daemonΒΆ
Error: Cannot connect to the Docker daemon at unix:///var/run/docker.sock
Solution:
# Start Docker
sudo systemctl start docker
# Or Docker Desktop
# Start Docker Desktop application
# Verify
docker ps
Error: Permission denied accessing volumeΒΆ
Error: Permission denied: '/app/configs/your-config.yaml'
Solution:
# Check file permissions
ls -la configs/
# Fix permissions
chmod 644 configs/your-config.yaml
# Or run with user mapping
docker run --rm --user $(id -u):$(id -g) \
-v $(pwd)/vpc/configs:/app/configs:ro \
-v $(pwd)/vpc/reports:/app/reports \
vpc-provisioner:latest \
--config edge-prod-a001-us-west-2-vpc.yaml \
--action validate-config
Error: Volume mount not foundΒΆ
Error: No such file or directory: '/app/configs/your-config.yaml'
Solution:
# Verify file exists on host
ls -la configs/your-config.yaml
# Use absolute path
docker run --rm \
-v ~/mlops-infra-suite/vpc/configs:/app/configs:ro \
-v ~/mlops-infra-suite/vpc/reports:/app/reports \
vpc-provisioner:latest \
--config edge-prod-a001-us-west-2-vpc.yaml \
--action validate-config
# Or use $(pwd)
docker run --rm \
-v $(pwd)/vpc/configs:/app/configs:ro \
-v $(pwd)/vpc/reports:/app/reports \
vpc-provisioner:latest \
--config edge-prod-a001-us-west-2-vpc.yaml \
--action validate-config
Deletion ErrorsΒΆ
Error: VPC has dependenciesΒΆ
Error: DependencyViolation: The vpc has dependencies and cannot be deleted
Solution:
# Use CloudFormation delete (recommended)
docker run --rm \
-v ~/.aws:/home/vpcuser/.aws:ro \
-v $(pwd)/vpc/configs:/app/configs:ro \
-v $(pwd)/vpc/reports:/app/reports \
vpc-provisioner:latest \
--config edge-prod-a001-us-west-2-vpc.yaml \
--action delete-vpc \
--force
# CloudFormation handles dependency order automatically
Error: Stack deletion failedΒΆ
Error: CloudFormation stack deletion failed
Solution:
# Check stack events
aws cloudformation describe-stack-events \
--stack-name edge-prod-a001-us-west-2-vpc-stack
# Common causes:
# 1. Resources in use (EC2 instances, RDS, etc.)
# 2. Manual changes outside CloudFormation
# 3. Insufficient permissions
# Manually delete blocking resources first
aws ec2 describe-instances --filters "Name=vpc-id,Values=vpc-xxxxx"
# Then retry stack deletion
Error: NAT Gateway still deletingΒΆ
Error: NAT Gateway is still in 'deleting' state
Solution:
# NAT Gateway deletion takes 5-10 minutes
# Wait for completion
aws ec2 describe-nat-gateways \
--nat-gateway-ids nat-xxxxx
# Check status
# State: pending | available | deleting | deleted | failed
Cost Estimation ErrorsΒΆ
Error: Template not foundΒΆ
Message: Template not found. Run 'create-prov-template' first.
Cause: The cost-traffic or cost-estimate action requires a generated CloudFormation template.
Solution:
# Generate the template first
--action create-prov-template
# Then run cost estimation
--action cost-traffic
--action cost-estimate
Error: Traffic file not foundΒΆ
Message: Traffic file not found. Run 'cost-traffic' first.
Cause: The cost-estimate action requires a traffic assumptions file.
Solution:
# Generate traffic assumptions
--action cost-traffic
# Edit the file to match your expected usage
# Then run cost estimate
--action cost-estimate
Error: Failed to refresh pricingΒΆ
Message: Failed to refresh pricing: ...
Cause: The cost-refresh-prices action could not reach the AWS Pricing API. Common reasons: no AWS credentials, no internet access, or insufficient permissions.
Solution:
Verify AWS credentials are configured
Ensure network connectivity to AWS APIs
The tool will use built-in pricing data as fallback β
cost-estimatestill works without refreshing
License Validation ErrorsΒΆ
Error: License validation failedΒΆ
Error: AWS Marketplace subscription not found
Solution for Testing:
# Run with license validation (requires AWS Marketplace subscription)
docker run --rm \
-v $(pwd)/vpc/configs:/app/configs:ro \
-v $(pwd)/vpc/reports:/app/reports \
vpc-provisioner:latest \
--config edge-prod-a001-us-west-2-vpc.yaml \
--action validate-config
Solution for Production:
Verify AWS Marketplace subscription is active
Check IAM permissions for AWS Marketplace
Contact AWS Marketplace support
Performance IssuesΒΆ
Slow CloudFormation Stack CreationΒΆ
Symptoms: Stack creation takes longer than expected (>10 minutes)
Causes:
NAT Gateway creation (5-10 minutes)
Multiple availability zones
AWS API throttling
Solution:
# Monitor stack events
aws cloudformation describe-stack-events \
--stack-name edge-prod-a001-us-west-2-vpc-stack
# Check AWS service health
curl https://status.aws.amazon.com/
# Wait for completion (can take 5-15 minutes)
aws cloudformation wait stack-create-complete \
--stack-name edge-prod-a001-us-west-2-vpc-stack
Timeout ErrorsΒΆ
Symptoms: Operation times out before completion
Solution:
# Check CloudFormation stack status
aws cloudformation describe-stacks \
--stack-name edge-prod-a001-us-west-2-vpc-stack
# If stack is still creating, wait for completion
aws cloudformation wait stack-create-complete \
--stack-name edge-prod-a001-us-west-2-vpc-stack
Advanced TroubleshootingΒΆ
Enable Debug LoggingΒΆ
# Set verbose logging
docker run --rm \
-v ~/.aws:/home/vpcuser/.aws:ro \
-v $(pwd)/vpc/configs:/app/configs:ro \
-v $(pwd)/vpc/reports:/app/reports \
-v $(pwd)/vpc/templates:/app/templates \
vpc-provisioner:latest \
--config edge-prod-a001-us-west-2-vpc.yaml \
--action create-vpc \
--verbose debug
Inspect ContainerΒΆ
# Run interactive shell
docker run --rm -it \
-v $(pwd)/vpc/configs:/app/configs:ro \
--entrypoint /bin/bash \
vpc-provisioner:latest
# Inside container
ls -la /app/
cat /app/configs/your-config.yaml
python -m vpc_provisioner.cli --help
Check AWS API CallsΒΆ
# Enable CloudTrail logging
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=EventName,AttributeValue=CreateVpc \
--max-results 10
# Check for errors
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=EventName,AttributeValue=CreateVpc \
--query 'Events[?ErrorCode!=`null`]'
Network ConnectivityΒΆ
# Test AWS endpoint connectivity
curl -I https://ec2.us-west-2.amazonaws.com
# Test from container
docker run --rm \
--entrypoint /bin/bash \
vpc-provisioner:latest \
-c "curl -I https://ec2.us-west-2.amazonaws.com"
Review Generated FilesΒΆ
# Check generated IAM policy
cat policies/edge-prod-a001-us-west-2-vpc-iam-policy.json
# Check generated CloudFormation template
cat templates/edge-prod-a001-us-west-2-vpc-template.yaml
# Check execution logs
cat reports/*.log
Getting HelpΒΆ
Collect Diagnostic InformationΒΆ
# System info
docker --version
aws --version
uname -a
# AWS identity
aws sts get-caller-identity
# Configuration (sanitized)
cat configs/your-config.yaml
# Error output
docker run --rm \
-v $(pwd)/vpc/configs:/app/configs:ro \
-v $(pwd)/vpc/reports:/app/reports \
vpc-provisioner:latest \
--config edge-prod-a001-us-west-2-vpc.yaml \
--action validate-config 2>&1 | tee error.log
Contact SupportΒΆ
Include in support request:
Docker version
AWS region
Sanitized configuration file
Complete error message
Steps to reproduce
Expected vs actual behavior
CloudFormation stack events (if applicable)
See SUPPORT.md for contact information.
Additional ResourcesΒΆ
CONFIGURATION.md - Configuration reference
IAM_PERMISSIONS.md - Required permissions
USER_GUIDE.md - Complete command reference
SUPPORT.md - Support information
AWS VPC Troubleshooting: https://docs.aws.amazon.com/vpc/latest/userguide/vpc-troubleshooting.html
AWS CloudFormation Troubleshooting: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/troubleshooting.html