Troubleshooting

Table of Contents

Quick Diagnostics

Check Configuration

docker run --rm \
  -v ~/.aws:/home/sguser/.aws:ro \
  -v $(pwd)/sg/configs:/app/configs:ro \
  -v $(pwd)/sg/reports:/app/reports \
  sg-provisioner:latest \
  -con my-config.yaml -act validate-config

Check Template

docker run --rm \
  -v ~/.aws:/home/sguser/.aws:ro \
  -v $(pwd)/sg/configs:/app/configs:ro \
  -v $(pwd)/sg/templates:/app/templates \
  -v $(pwd)/sg/reports:/app/reports \
  sg-provisioner:latest \
  -con my-config.yaml -act validate-prov-template

Check AWS Access

aws sts get-caller-identity
aws ec2 describe-security-groups --max-results 5 --region us-west-2

Check VPC Exists

aws ssm get-parameter --name /vpc/my-vpc-name/VPCId --region us-west-2

Enable Verbose Logging

docker run --rm \
  -v ~/.aws:/home/sguser/.aws:ro \
  -v $(pwd)/sg/configs:/app/configs:ro \
  -v $(pwd)/sg/reports:/app/reports \
  sg-provisioner:latest \
  -con my-config.yaml -act <action> --verbose debug

Configuration Errors

Invalid config file name

Error: Invalid config file name: path/to/file.yaml. File name must not contain path separators

Cause: You passed a path instead of just the filename.

Solution: Run from the package directory and pass only the filename:

docker run --rm \
  -v ~/.aws:/home/sguser/.aws:ro \
  -v $(pwd)/sg/configs:/app/configs:ro \
  -v $(pwd)/sg/reports:/app/reports \
  sg-provisioner:latest \
  -con my-config.yaml -act validate-config

Schema validation failed

Error: Configuration validation failed: '<field>' is a required property

Cause: Missing required field in config.

Solution: Ensure all required fields are present:

  • client: company_name, company_prefix, account_id, tenant_id

  • environment: env, region

  • security_groups: scenario, vpc_source

Invalid company_prefix

Error: 'MY_PREFIX' does not match '^[a-z][a-z0-9-]*$'

Cause: Prefix must be lowercase, start with a letter.

Solution:

# ❌ Wrong
company_prefix: MY_PREFIX
company_prefix: 123abc

# ✅ Correct
company_prefix: myprefix
company_prefix: my-prefix

Invalid account_id

Error: '12345' does not match '^[0-9]{12}$'

Cause: Account ID must be exactly 12 digits, quoted.

Solution:

# ❌ Wrong
account_id: 12345
account_id: 123456789012  # Unquoted may lose leading zeros

# ✅ Correct
account_id: "123456789012"

AWS Credential Issues

No credentials found

Error: Failed to initialize AWS clients: Unable to locate credentials

Cause: AWS credentials not configured or not mounted in Docker.

Solution (Docker):

docker run --rm \
  -v ~/.aws:/home/sguser/.aws:ro \
  ...

Solution (Local):

aws configure
# Or set environment variables:
export AWS_ACCESS_KEY_ID=<your-key>
export AWS_SECRET_ACCESS_KEY=<your-secret>
export AWS_DEFAULT_REGION=us-west-2

Access denied

Error: An error occurred (UnauthorizedOperation) when calling DescribeSecurityGroups

Cause: IAM user/role lacks required permissions.

Solution: Generate and attach the IAM policy:

docker run --rm \
  -v ~/.aws:/home/sguser/.aws:ro \
  -v $(pwd)/sg/configs:/app/configs:ro \
  -v $(pwd)/sg/policies:/app/policies \
  -v $(pwd)/sg/reports:/app/reports \
  sg-provisioner:latest \
  -con my-config.yaml -act create-policy

Then attach the generated JSON policy to your IAM user/role.

Expired credentials

Error: ExpiredTokenException: The security token included in the request is expired

Solution: Refresh credentials:

aws sso login  # If using SSO
# Or regenerate temporary credentials

VPC Resolution Errors

VPC not found in Parameter Store

Error: VPC ID not found in Parameter Store at: /vpc/.../VPCId

Cause: The VPC hasn’t been deployed, or the path is wrong.

Solution:

  1. Deploy the VPC first using the VPC Provisioner

  2. Verify the path exists:

    aws ssm get-parameter --name /vpc/my-vpc-name/VPCId --region us-west-2
    
  3. Or switch to direct mode:

    security_groups:
      vpc_source: direct
      vpc_id: vpc-0abc123def456
    

VPC does not exist

Error: Validation failed: VPC vpc-xxx does not exist in this region

Cause: The VPC ID is valid format but the VPC was deleted or is in a different region.

Solution: Verify the VPC exists:

aws ec2 describe-vpcs --vpc-ids vpc-0abc123 --region us-west-2

Unknown vpc_source

Error: Unknown vpc_source: <value>

Solution: Use either parameter-store or direct:

security_groups:
  vpc_source: parameter-store  # or: direct

CloudFormation Errors

Stack already exists

Error: Stack [globalbank-prod-c001-us-west-2-sg-stack] already exists

Cause: Security groups were already deployed.

Solution:

  • Use show-changes to see what would change

  • Use delete-security-groups --force to remove and redeploy

  • Use workload field for a separate SG set

Stack creation failed

Error: Stack creation failed or timed out

Cause: CloudFormation couldn’t create one or more resources.

Solution:

  1. Check the CloudFormation console for detailed error events

  2. Common causes:

    • VPC doesn’t exist anymore

    • Security group limit reached (default: 2500 per region)

    • IAM permissions insufficient

Resource limit exceeded

Error: The maximum number of security groups has been reached

Solution: Request a limit increase via AWS Service Quotas, or delete unused security groups.

Scenario Errors

Scenario not found

Error: Scenario YAML not found: schemas/scenarios/my-scenario.yaml

Cause: Scenario name doesn’t match any file.

Solution:

docker run --rm \
  -v ~/.aws:/home/sguser/.aws:ro \
  -v $(pwd)/sg/configs:/app/configs:ro \
  -v $(pwd)/sg/reports:/app/reports \
  sg-provisioner:latest \
  -con my-config.yaml -act list-scenarios

Use the exact name from the list (without .yaml).

Invalid rule in scenario

Error: Specify either 'source' or 'source_tier', not both

Cause: A rule has both source and source_tier set.

Solution: Use one or the other:

# ✅ CIDR-based
- protocol: tcp
  port: 443
  source: "0.0.0.0/0"

# ✅ Tier-based
- protocol: tcp
  port: 8080
  source_tier: web

Invalid port in override

Error: Port values must be between 1 and 65535

Solution: Ensure port values are valid integers:

overrides:
  app:
    port_overrides:
      - protocol: tcp
        old_port: 8080
        new_port: 8443  # Must be 1-65535

Template Validation Errors

Unresolved reference

Error: Unresolved !Ref 'XyzSecurityGroup'

Cause: A rule references a tier that doesn’t exist in the scenario.

Solution: Check that source_tier / destination_tier values match actual tier names in the scenario.

Unknown resource type

Error: Unknown resource type 'AWS::EC2::Instance' on ResourceName

Cause: Template contains unexpected resource types.

Solution: SG templates should only contain:

  • AWS::EC2::SecurityGroup

  • AWS::EC2::SecurityGroupIngress

  • AWS::EC2::SecurityGroupEgress

  • AWS::SSM::Parameter

Docker Errors

Permission denied on volumes

Error: Permission denied: '/app/reports/...'

Cause: Volume mount permissions mismatch.

Solution: Ensure host directories exist and are writable:

mkdir -p sg/{configs,policies,reports,templates,schemas}
chmod 777 sg/reports sg/policies sg/templates

Config file not found

Error: Configuration file not found: /app/configs/my-config.yaml

Cause: Config file not in the mounted volume.

Solution: Verify the mount and filename:

ls $(pwd)/sg/configs/  # Should show your config file
docker run --rm \
  -v $(pwd)/sg/configs:/app/configs:ro \
  ...

License validation failed

Error: License validation failed. Exiting.

Cause: No active AWS Marketplace subscription.

Solution: Subscribe to the SG Provisioner product on AWS Marketplace. Ensure the subscription is active and associated with the AWS account you are using.

Deletion Errors

Stack does not exist

Error: SG stack 'my-stack' does not exist

Cause: Stack was already deleted or never created.

Solution: No action needed — resources are already gone.

Delete failed — resources in use

Error: resource sg-xxx has a dependent object

Cause: Security group is attached to an ENI (EC2 instance, Lambda, RDS, etc.).

Solution:

  1. Identify what’s using the SG:

    aws ec2 describe-network-interfaces --filters Name=group-id,Values=sg-xxx --region us-west-2
    
  2. Remove the SG from those resources first

  3. Retry deletion

Force flag required

Error: Security group deletion requires --force flag

Solution: Add --force:

docker run --rm \
  -v ~/.aws:/home/sguser/.aws:ro \
  -v $(pwd)/sg/configs:/app/configs:ro \
  -v $(pwd)/sg/reports:/app/reports \
  sg-provisioner:latest \
  -con my-config.yaml -act delete-security-groups --force

Drift Detection Issues

No deployed stack found

Error: No deployed stack found for drift detection

Cause: Stack hasn’t been deployed yet.

Solution: Deploy first with create-security-groups --force, then check drift.

Drift detection timed out

Error: Drift detection timed out

Cause: AWS took too long to complete drift analysis (>10 minutes).

Solution: Retry — this is usually transient. If persistent, check the CloudFormation console.

Drift detected

Message: DRIFT DETECTED in stack

Cause: Someone modified security group rules outside of CloudFormation.

Solution:

  1. Review the drift details in the output

  2. Either:

    • Update your config/scenario to match the desired state, regenerate template, and deploy

    • Or revert manual changes by redeploying: create-security-groups --force (after deleting)

Parameter Store Issues

Parameter not found after deployment

Cause: Deployment succeeded but parameter storage failed.

Solution: Check manually:

aws ssm get-parameters-by-path --path /sg/my-sg-name/ --region us-west-2

If empty, the stack outputs may not have been stored. Check the deployment log in reports/.

Stale parameters after deletion

Cause: Stack deleted but parameter cleanup failed.

Solution: Delete manually:

aws ssm delete-parameter --name /sg/my-sg-name/web/SecurityGroupId --region us-west-2
aws ssm delete-parameter --name /sg/my-sg-name/app/SecurityGroupId --region us-west-2
aws ssm delete-parameter --name /sg/my-sg-name/db/SecurityGroupId --region us-west-2

Getting Help

  1. Check logs in reports/ directory for detailed error information

  2. Use --verbose debug for maximum logging detail

  3. Review the Configuration Reference for valid values

  4. Review the Scenarios Reference for scenario structure