TroubleshootingΒΆ

Quick DiagnosticsΒΆ

Table of ContentsΒΆ

# Check AWS credentials
aws sts get-caller-identity

# Check Docker version
docker --version

# Test AWS VPC access
aws ec2 describe-vpcs --region us-west-2

# Validate configuration
docker run --rm \
  -v $(pwd)/vpc/configs:/app/configs:ro \
  -v $(pwd)/vpc/reports:/app/reports \
  vpc-provisioner:latest \
  --config edge-prod-a001-us-west-2-vpc.yaml \
  --action validate-config

Common PitfallsΒΆ

1. Not quoting account_id in YAML

  • ❌ account_id: 123456789012 (loses leading zeros)

  • βœ… account_id: "123456789012" (preserves format)

2. Skipping configuration validation

  • Always run validate-config before AWS operations

  • Catches most errors before deployment β€” saves time and avoids failed stacks

3. Not reviewing generated templates

  • Review CloudFormation templates before create-vpc

  • Use validate-prov-template to catch reference errors locally

  • Prevents unexpected resource creation

4. Using access keys in production

  • ❌ Environment variables with long-lived access keys

  • βœ… IAM roles (EC2/ECS) or AWS profiles with MFA

  • Access keys are visible in process lists and Docker history

5. Overlapping CIDR blocks

  • Plan CIDR allocation before deployment

  • Document CIDR usage in a central registry

  • Ensure new VPC CIDRs don’t overlap with existing VPCs or on-premises networks


Common ErrorsΒΆ

AWS CredentialsΒΆ

Error: Unable to locate credentialsΒΆ

Error: Unable to locate credentials. You can configure credentials by running "aws configure".

Solution:

# Option 1: Environment variables
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
export AWS_DEFAULT_REGION=us-west-2

# Option 2: AWS profile
export AWS_PROFILE=default

# Verify
aws sts get-caller-identity

Error: The security token included in the request is invalidΒΆ

Error: The security token included in the request is invalid

Causes:

  • Expired temporary credentials

  • Invalid access key

  • Credentials from different account

Solution:

# Refresh credentials
aws sts get-session-token

# Or use long-term credentials
cat ~/.aws/credentials

# Verify current identity
aws sts get-caller-identity

Error: Access DeniedΒΆ

Error: An error occurred (UnauthorizedOperation) when calling the CreateVpc operation

Solution:

# Check current user
aws sts get-caller-identity

# Generate IAM policy
docker run --rm \
  -v $(pwd)/vpc/configs:/app/configs:ro \
  -v $(pwd)/vpc/policies:/app/policies \
  -v $(pwd)/vpc/reports:/app/reports \
  vpc-provisioner:latest \
  --config edge-prod-a001-us-west-2-vpc.yaml \
  --action create-policy

# Review and attach generated policy
cat policies/edge-prod-a001-us-west-2-vpc-iam-policy.json

# See IAM_PERMISSIONS.md for required permissions

Configuration ErrorsΒΆ

Error: Invalid CIDR blockΒΆ

Error: Invalid CIDR block format

Solution:

# Valid
vpc:
  cidr_block: 10.0.0.0/16

# Invalid
vpc:
  cidr_block: 10.0.0.0/8   # Too large
  cidr_block: 10.0.0.0/29  # Too small

Error: Subnet CIDR not within VPC CIDRΒΆ

Error: Subnet CIDR must be within VPC CIDR range

Solution:

# VPC: 10.0.0.0/16

# Valid subnet
vpc:
  subnets:
    public:
      - cidr: 10.0.1.0/24  # Within VPC CIDR

# Invalid subnet
vpc:
  subnets:
    public:
      - cidr: 192.168.1.0/24  # Outside VPC CIDR

Error: Overlapping subnet CIDRsΒΆ

Error: Subnet CIDRs cannot overlap

Solution:

# Wrong
vpc:
  subnets:
    public:
      - cidr: 10.0.1.0/24
    private:
      - cidr: 10.0.1.0/24  # Overlaps!

# Correct
vpc:
  subnets:
    public:
      - cidr: 10.0.1.0/24
    private:
      - cidr: 10.0.11.0/24  # No overlap

Error: Configuration validation failedΒΆ

Error: Configuration validation failed: 'client' is a required property

Solution: Ensure configuration has all three required sections:

client:
  company_name: Edge Corp
  company_prefix: edge
  account_id: "123456789012"
  tenant_id: "a001"

environment:
  env: prod
  region: us-west-2

vpc:
  vpc_name_override: ""
  cidr_block: 10.0.0.0/16
  availability_zones:
    - us-west-2a
  subnets:
    public:
      - name: public-subnet-1
        cidr: 10.0.1.0/24
        az: us-west-2a
  internet_gateway:
    enabled: true
  nat_gateway:
    enabled: false

Error: Invalid availability zoneΒΆ

Error: Availability zone not in list

Solution:

# Wrong
vpc:
  availability_zones:
    - us-west-2a
  subnets:
    public:
      - az: us-west-2c  # Not in AZ list!

# Correct
vpc:
  availability_zones:
    - us-west-2a
    - us-west-2c
  subnets:
    public:
      - az: us-west-2c  # In AZ list

Error: Account ID not quotedΒΆ

Error: Account ID lost leading zeros

Solution:

# Wrong (loses leading zeros)
client:
  account_id: 123456789012

# Correct (preserves leading zeros)
client:
  account_id: "123456789012"

CloudFormation ErrorsΒΆ

Error: Stack already existsΒΆ

Error: Stack [edge-prod-a001-us-west-2-vpc-stack] already exists

Solution:

# Option 1: Delete existing stack first
docker run --rm \
  -v ~/.aws:/home/vpcuser/.aws:ro \
  -v $(pwd)/vpc/configs:/app/configs:ro \
  -v $(pwd)/vpc/reports:/app/reports \
  vpc-provisioner:latest \
  --config edge-prod-a001-us-west-2-vpc.yaml \
  --action delete-vpc \
  --force

# Option 2: Use different VPC name
vpc:
  vpc_name_override: "my-vpc-v2"

Error: CloudFormation rollbackΒΆ

Error: Stack creation failed and rolled back

Solution:

# Check CloudFormation events
aws cloudformation describe-stack-events \
  --stack-name edge-prod-a001-us-west-2-vpc-stack \
  --max-items 20

# Common causes:
# 1. Insufficient IAM permissions
# 2. Invalid CIDR blocks
# 3. Overlapping subnets
# 4. Invalid availability zones
# 5. NAT Gateway without Internet Gateway

# Review logs
cat reports/*.log

Error: VPC limit exceededΒΆ

Error: VpcLimitExceeded: The maximum number of VPCs has been reached

Solution:

# Check VPC limit
aws ec2 describe-account-attributes \
  --attribute-names max-vpcs

# List existing VPCs
aws ec2 describe-vpcs --region us-west-2

# Delete unused VPCs or request limit increase
aws service-quotas request-service-quota-increase \
  --service-code vpc \
  --quota-code L-F678F1CE \
  --desired-value 10

Template Validation ErrorsΒΆ

Error: Template file not foundΒΆ

Error: Template file not found: templates/edge-prod-a001-us-west-2-vpc-template.yaml

Solution: Generate the template first, or let validate-prov-template auto-generate it:

docker run --rm \
  -v $(pwd)/vpc/configs:/app/configs:ro \
  -v $(pwd)/vpc/templates:/app/templates \
  -v $(pwd)/vpc/reports:/app/reports \
  vpc-provisioner:latest \
  --config edge-prod-a001-us-west-2-vpc.yaml \
  --action validate-prov-template

Note: If the template does not exist, it will be generated automatically before validation.


Error: Invalid !Ref targetΒΆ

Error: !Ref target 'InvalidResource' not found in Resources or Parameters

Causes:

  • Template references a resource that doesn’t exist

  • Typo in resource name

  • Template was manually edited

Solution: Regenerate the template from configuration:

docker run --rm \
  -v $(pwd)/vpc/configs:/app/configs:ro \
  -v $(pwd)/vpc/templates:/app/templates \
  -v $(pwd)/vpc/reports:/app/reports \
  vpc-provisioner:latest \
  --config edge-prod-a001-us-west-2-vpc.yaml \
  --action create-prov-template

Change Preview ErrorsΒΆ

Error: Stack does not exist for show-changesΒΆ

Error: Stack [edge-prod-a001-us-west-2-vpc-stack] does not exist

Cause: show-changes requires a deployed stack to compare against.

Solution: Deploy the stack first, then preview changes:

# Deploy first
docker run --rm \
  -v ~/.aws:/home/vpcuser/.aws:ro \
  -v $(pwd)/vpc/configs:/app/configs:ro \
  -v $(pwd)/vpc/templates:/app/templates \
  -v $(pwd)/vpc/reports:/app/reports \
  vpc-provisioner:latest \
  --config edge-prod-a001-us-west-2-vpc.yaml \
  --action create-vpc \
  --force

# Then preview changes
docker run --rm \
  -v ~/.aws:/home/vpcuser/.aws:ro \
  -v $(pwd)/vpc/configs:/app/configs:ro \
  -v $(pwd)/vpc/templates:/app/templates \
  -v $(pwd)/vpc/reports:/app/reports \
  vpc-provisioner:latest \
  --config edge-prod-a001-us-west-2-vpc.yaml \
  --action show-changes

Error: No changes detectedΒΆ

Symptoms: show-changes reports no pending changes.

Cause: The deployed stack matches the current template.

Solution: This is expected when no configuration changes have been made. Modify your configuration or template, then re-run show-changes.


Drift Detection ErrorsΒΆ

Error: Stack does not exist for check-driftΒΆ

Error: Stack [edge-prod-a001-us-west-2-vpc-stack] does not exist

Cause: check-drift requires a deployed stack to detect drift against.

Solution: Deploy the stack first:

docker run --rm \
  -v ~/.aws:/home/vpcuser/.aws:ro \
  -v $(pwd)/vpc/configs:/app/configs:ro \
  -v $(pwd)/vpc/templates:/app/templates \
  -v $(pwd)/vpc/reports:/app/reports \
  vpc-provisioner:latest \
  --config edge-prod-a001-us-west-2-vpc.yaml \
  --action create-vpc \
  --force

Error: Drift detection timeoutΒΆ

Symptoms: Drift detection takes too long or times out.

Causes:

  • Large number of resources in the stack (subnets, NAT gateways, route tables)

  • AWS API throttling

Solution:

# Check drift detection status manually
aws cloudformation describe-stack-drift-detection-status \
  --stack-drift-detection-id <detection-id>

# Retry
docker run --rm \
  -v ~/.aws:/home/vpcuser/.aws:ro \
  -v $(pwd)/vpc/configs:/app/configs:ro \
  -v $(pwd)/vpc/reports:/app/reports \
  vpc-provisioner:latest \
  --config edge-prod-a001-us-west-2-vpc.yaml \
  --action check-drift

NAT Gateway ErrorsΒΆ

Error: NAT Gateway requires Internet GatewayΒΆ

Error: NAT Gateway requires Internet Gateway to be enabled

Solution:

# Wrong
vpc:
  internet_gateway:
    enabled: false
  nat_gateway:
    enabled: true  # Requires IGW!

# Correct
vpc:
  internet_gateway:
    enabled: true
  nat_gateway:
    enabled: true

Error: NAT Gateway requires public subnetΒΆ

Error: NAT Gateway requires at least one public subnet

Solution:

# Wrong
vpc:
  subnets:
    private:
      - name: private-subnet-1
        cidr: 10.0.11.0/24
        az: us-west-2a
  nat_gateway:
    enabled: true  # No public subnet!

# Correct
vpc:
  subnets:
    public:
      - name: public-subnet-1
        cidr: 10.0.1.0/24
        az: us-west-2a
    private:
      - name: private-subnet-1
        cidr: 10.0.11.0/24
        az: us-west-2a
  nat_gateway:
    enabled: true

Error: Elastic IP limit exceededΒΆ

Error: AddressLimitExceeded: The maximum number of addresses has been reached

Solution:

# Check Elastic IP limit
aws ec2 describe-account-attributes \
  --attribute-names max-elastic-ips

# List existing Elastic IPs
aws ec2 describe-addresses --region us-west-2

# Release unused Elastic IPs
aws ec2 release-address --allocation-id eipalloc-xxxxx

# Or request limit increase

Docker ErrorsΒΆ

Error: Cannot connect to Docker daemonΒΆ

Error: Cannot connect to the Docker daemon at unix:///var/run/docker.sock

Solution:

# Start Docker
sudo systemctl start docker

# Or Docker Desktop
# Start Docker Desktop application

# Verify
docker ps

Error: Permission denied accessing volumeΒΆ

Error: Permission denied: '/app/configs/your-config.yaml'

Solution:

# Check file permissions
ls -la configs/

# Fix permissions
chmod 644 configs/your-config.yaml

# Or run with user mapping
docker run --rm --user $(id -u):$(id -g) \
  -v $(pwd)/vpc/configs:/app/configs:ro \
  -v $(pwd)/vpc/reports:/app/reports \
  vpc-provisioner:latest \
  --config edge-prod-a001-us-west-2-vpc.yaml \
  --action validate-config

Error: Volume mount not foundΒΆ

Error: No such file or directory: '/app/configs/your-config.yaml'

Solution:

# Verify file exists on host
ls -la configs/your-config.yaml

# Use absolute path
docker run --rm \
  -v ~/mlops-infra-suite/vpc/configs:/app/configs:ro \
  -v ~/mlops-infra-suite/vpc/reports:/app/reports \
  vpc-provisioner:latest \
  --config edge-prod-a001-us-west-2-vpc.yaml \
  --action validate-config

# Or use $(pwd)
docker run --rm \
  -v $(pwd)/vpc/configs:/app/configs:ro \
  -v $(pwd)/vpc/reports:/app/reports \
  vpc-provisioner:latest \
  --config edge-prod-a001-us-west-2-vpc.yaml \
  --action validate-config

Deletion ErrorsΒΆ

Error: VPC has dependenciesΒΆ

Error: DependencyViolation: The vpc has dependencies and cannot be deleted

Solution:

# Use CloudFormation delete (recommended)
docker run --rm \
  -v ~/.aws:/home/vpcuser/.aws:ro \
  -v $(pwd)/vpc/configs:/app/configs:ro \
  -v $(pwd)/vpc/reports:/app/reports \
  vpc-provisioner:latest \
  --config edge-prod-a001-us-west-2-vpc.yaml \
  --action delete-vpc \
  --force

# CloudFormation handles dependency order automatically

Error: Stack deletion failedΒΆ

Error: CloudFormation stack deletion failed

Solution:

# Check stack events
aws cloudformation describe-stack-events \
  --stack-name edge-prod-a001-us-west-2-vpc-stack

# Common causes:
# 1. Resources in use (EC2 instances, RDS, etc.)
# 2. Manual changes outside CloudFormation
# 3. Insufficient permissions

# Manually delete blocking resources first
aws ec2 describe-instances --filters "Name=vpc-id,Values=vpc-xxxxx"

# Then retry stack deletion

Error: NAT Gateway still deletingΒΆ

Error: NAT Gateway is still in 'deleting' state

Solution:

# NAT Gateway deletion takes 5-10 minutes
# Wait for completion
aws ec2 describe-nat-gateways \
  --nat-gateway-ids nat-xxxxx

# Check status
# State: pending | available | deleting | deleted | failed

Cost Estimation ErrorsΒΆ

Error: Template not foundΒΆ

Message: Template not found. Run 'create-prov-template' first.

Cause: The cost-traffic or cost-estimate action requires a generated CloudFormation template.

Solution:

# Generate the template first
--action create-prov-template

# Then run cost estimation
--action cost-traffic
--action cost-estimate

Error: Traffic file not foundΒΆ

Message: Traffic file not found. Run 'cost-traffic' first.

Cause: The cost-estimate action requires a traffic assumptions file.

Solution:

# Generate traffic assumptions
--action cost-traffic

# Edit the file to match your expected usage
# Then run cost estimate
--action cost-estimate

Error: Failed to refresh pricingΒΆ

Message: Failed to refresh pricing: ...

Cause: The cost-refresh-prices action could not reach the AWS Pricing API. Common reasons: no AWS credentials, no internet access, or insufficient permissions.

Solution:

  • Verify AWS credentials are configured

  • Ensure network connectivity to AWS APIs

  • The tool will use built-in pricing data as fallback β€” cost-estimate still works without refreshing


License Validation ErrorsΒΆ

Error: License validation failedΒΆ

Error: AWS Marketplace subscription not found

Solution for Testing:

# Run with license validation (requires AWS Marketplace subscription)
docker run --rm \
  -v $(pwd)/vpc/configs:/app/configs:ro \
  -v $(pwd)/vpc/reports:/app/reports \
  vpc-provisioner:latest \
  --config edge-prod-a001-us-west-2-vpc.yaml \
  --action validate-config

Solution for Production:

  1. Verify AWS Marketplace subscription is active

  2. Check IAM permissions for AWS Marketplace

  3. Contact AWS Marketplace support


Performance IssuesΒΆ

Slow CloudFormation Stack CreationΒΆ

Symptoms: Stack creation takes longer than expected (>10 minutes)

Causes:

  • NAT Gateway creation (5-10 minutes)

  • Multiple availability zones

  • AWS API throttling

Solution:

# Monitor stack events
aws cloudformation describe-stack-events \
  --stack-name edge-prod-a001-us-west-2-vpc-stack

# Check AWS service health
curl https://status.aws.amazon.com/

# Wait for completion (can take 5-15 minutes)
aws cloudformation wait stack-create-complete \
  --stack-name edge-prod-a001-us-west-2-vpc-stack

Timeout ErrorsΒΆ

Symptoms: Operation times out before completion

Solution:

# Check CloudFormation stack status
aws cloudformation describe-stacks \
  --stack-name edge-prod-a001-us-west-2-vpc-stack

# If stack is still creating, wait for completion
aws cloudformation wait stack-create-complete \
  --stack-name edge-prod-a001-us-west-2-vpc-stack

Advanced TroubleshootingΒΆ

Enable Debug LoggingΒΆ

# Set verbose logging
docker run --rm \
  -v ~/.aws:/home/vpcuser/.aws:ro \
  -v $(pwd)/vpc/configs:/app/configs:ro \
  -v $(pwd)/vpc/reports:/app/reports \
  -v $(pwd)/vpc/templates:/app/templates \
  vpc-provisioner:latest \
  --config edge-prod-a001-us-west-2-vpc.yaml \
  --action create-vpc \
  --verbose debug

Inspect ContainerΒΆ

# Run interactive shell
docker run --rm -it \
  -v $(pwd)/vpc/configs:/app/configs:ro \
  --entrypoint /bin/bash \
  vpc-provisioner:latest

# Inside container
ls -la /app/
cat /app/configs/your-config.yaml
python -m vpc_provisioner.cli --help

Check AWS API CallsΒΆ

# Enable CloudTrail logging
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=EventName,AttributeValue=CreateVpc \
  --max-results 10

# Check for errors
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=EventName,AttributeValue=CreateVpc \
  --query 'Events[?ErrorCode!=`null`]'

Network ConnectivityΒΆ

# Test AWS endpoint connectivity
curl -I https://ec2.us-west-2.amazonaws.com

# Test from container
docker run --rm \
  --entrypoint /bin/bash \
  vpc-provisioner:latest \
  -c "curl -I https://ec2.us-west-2.amazonaws.com"

Review Generated FilesΒΆ

# Check generated IAM policy
cat policies/edge-prod-a001-us-west-2-vpc-iam-policy.json

# Check generated CloudFormation template
cat templates/edge-prod-a001-us-west-2-vpc-template.yaml

# Check execution logs
cat reports/*.log

Getting HelpΒΆ

Collect Diagnostic InformationΒΆ

# System info
docker --version
aws --version
uname -a

# AWS identity
aws sts get-caller-identity

# Configuration (sanitized)
cat configs/your-config.yaml

# Error output
docker run --rm \
  -v $(pwd)/vpc/configs:/app/configs:ro \
  -v $(pwd)/vpc/reports:/app/reports \
  vpc-provisioner:latest \
  --config edge-prod-a001-us-west-2-vpc.yaml \
  --action validate-config 2>&1 | tee error.log

Contact SupportΒΆ

Include in support request:

  1. Docker version

  2. AWS region

  3. Sanitized configuration file

  4. Complete error message

  5. Steps to reproduce

  6. Expected vs actual behavior

  7. CloudFormation stack events (if applicable)

See SUPPORT.md for contact information.


Additional ResourcesΒΆ

  • CONFIGURATION.md - Configuration reference

  • IAM_PERMISSIONS.md - Required permissions

  • USER_GUIDE.md - Complete command reference

  • SUPPORT.md - Support information

  • AWS VPC Troubleshooting: https://docs.aws.amazon.com/vpc/latest/userguide/vpc-troubleshooting.html

  • AWS CloudFormation Troubleshooting: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/troubleshooting.html