Troubleshooting¶
Table of Contents¶
Quick Diagnostics¶
Check Configuration¶
docker run --rm \
-v ~/.aws:/home/sguser/.aws:ro \
-v $(pwd)/sg/configs:/app/configs:ro \
-v $(pwd)/sg/reports:/app/reports \
sg-provisioner:latest \
-con my-config.yaml -act validate-config
Check Template¶
docker run --rm \
-v ~/.aws:/home/sguser/.aws:ro \
-v $(pwd)/sg/configs:/app/configs:ro \
-v $(pwd)/sg/templates:/app/templates \
-v $(pwd)/sg/reports:/app/reports \
sg-provisioner:latest \
-con my-config.yaml -act validate-prov-template
Check AWS Access¶
aws sts get-caller-identity
aws ec2 describe-security-groups --max-results 5 --region us-west-2
Check VPC Exists¶
aws ssm get-parameter --name /vpc/my-vpc-name/VPCId --region us-west-2
Enable Verbose Logging¶
docker run --rm \
-v ~/.aws:/home/sguser/.aws:ro \
-v $(pwd)/sg/configs:/app/configs:ro \
-v $(pwd)/sg/reports:/app/reports \
sg-provisioner:latest \
-con my-config.yaml -act <action> --verbose debug
Configuration Errors¶
Invalid config file name¶
Error: Invalid config file name: path/to/file.yaml. File name must not contain path separators
Cause: You passed a path instead of just the filename.
Solution: Run from the package directory and pass only the filename:
docker run --rm \
-v ~/.aws:/home/sguser/.aws:ro \
-v $(pwd)/sg/configs:/app/configs:ro \
-v $(pwd)/sg/reports:/app/reports \
sg-provisioner:latest \
-con my-config.yaml -act validate-config
Schema validation failed¶
Error: Configuration validation failed: '<field>' is a required property
Cause: Missing required field in config.
Solution: Ensure all required fields are present:
client: company_name, company_prefix, account_id, tenant_idenvironment: env, regionsecurity_groups: scenario, vpc_source
Invalid company_prefix¶
Error: 'MY_PREFIX' does not match '^[a-z][a-z0-9-]*$'
Cause: Prefix must be lowercase, start with a letter.
Solution:
# ❌ Wrong
company_prefix: MY_PREFIX
company_prefix: 123abc
# ✅ Correct
company_prefix: myprefix
company_prefix: my-prefix
Invalid account_id¶
Error: '12345' does not match '^[0-9]{12}$'
Cause: Account ID must be exactly 12 digits, quoted.
Solution:
# ❌ Wrong
account_id: 12345
account_id: 123456789012 # Unquoted may lose leading zeros
# ✅ Correct
account_id: "123456789012"
AWS Credential Issues¶
No credentials found¶
Error: Failed to initialize AWS clients: Unable to locate credentials
Cause: AWS credentials not configured or not mounted in Docker.
Solution (Docker):
docker run --rm \
-v ~/.aws:/home/sguser/.aws:ro \
...
Solution (Local):
aws configure
# Or set environment variables:
export AWS_ACCESS_KEY_ID=<your-key>
export AWS_SECRET_ACCESS_KEY=<your-secret>
export AWS_DEFAULT_REGION=us-west-2
Access denied¶
Error: An error occurred (UnauthorizedOperation) when calling DescribeSecurityGroups
Cause: IAM user/role lacks required permissions.
Solution: Generate and attach the IAM policy:
docker run --rm \
-v ~/.aws:/home/sguser/.aws:ro \
-v $(pwd)/sg/configs:/app/configs:ro \
-v $(pwd)/sg/policies:/app/policies \
-v $(pwd)/sg/reports:/app/reports \
sg-provisioner:latest \
-con my-config.yaml -act create-policy
Then attach the generated JSON policy to your IAM user/role.
Expired credentials¶
Error: ExpiredTokenException: The security token included in the request is expired
Solution: Refresh credentials:
aws sso login # If using SSO
# Or regenerate temporary credentials
VPC Resolution Errors¶
VPC not found in Parameter Store¶
Error: VPC ID not found in Parameter Store at: /vpc/.../VPCId
Cause: The VPC hasn’t been deployed, or the path is wrong.
Solution:
Deploy the VPC first using the VPC Provisioner
Verify the path exists:
aws ssm get-parameter --name /vpc/my-vpc-name/VPCId --region us-west-2
Or switch to direct mode:
security_groups: vpc_source: direct vpc_id: vpc-0abc123def456
VPC does not exist¶
Error: Validation failed: VPC vpc-xxx does not exist in this region
Cause: The VPC ID is valid format but the VPC was deleted or is in a different region.
Solution: Verify the VPC exists:
aws ec2 describe-vpcs --vpc-ids vpc-0abc123 --region us-west-2
Unknown vpc_source¶
Error: Unknown vpc_source: <value>
Solution: Use either parameter-store or direct:
security_groups:
vpc_source: parameter-store # or: direct
CloudFormation Errors¶
Stack already exists¶
Error: Stack [globalbank-prod-c001-us-west-2-sg-stack] already exists
Cause: Security groups were already deployed.
Solution:
Use
show-changesto see what would changeUse
delete-security-groups --forceto remove and redeployUse
workloadfield for a separate SG set
Stack creation failed¶
Error: Stack creation failed or timed out
Cause: CloudFormation couldn’t create one or more resources.
Solution:
Check the CloudFormation console for detailed error events
Common causes:
VPC doesn’t exist anymore
Security group limit reached (default: 2500 per region)
IAM permissions insufficient
Resource limit exceeded¶
Error: The maximum number of security groups has been reached
Solution: Request a limit increase via AWS Service Quotas, or delete unused security groups.
Scenario Errors¶
Scenario not found¶
Error: Scenario YAML not found: schemas/scenarios/my-scenario.yaml
Cause: Scenario name doesn’t match any file.
Solution:
docker run --rm \
-v ~/.aws:/home/sguser/.aws:ro \
-v $(pwd)/sg/configs:/app/configs:ro \
-v $(pwd)/sg/reports:/app/reports \
sg-provisioner:latest \
-con my-config.yaml -act list-scenarios
Use the exact name from the list (without .yaml).
Invalid rule in scenario¶
Error: Specify either 'source' or 'source_tier', not both
Cause: A rule has both source and source_tier set.
Solution: Use one or the other:
# ✅ CIDR-based
- protocol: tcp
port: 443
source: "0.0.0.0/0"
# ✅ Tier-based
- protocol: tcp
port: 8080
source_tier: web
Invalid port in override¶
Error: Port values must be between 1 and 65535
Solution: Ensure port values are valid integers:
overrides:
app:
port_overrides:
- protocol: tcp
old_port: 8080
new_port: 8443 # Must be 1-65535
Template Validation Errors¶
Unresolved reference¶
Error: Unresolved !Ref 'XyzSecurityGroup'
Cause: A rule references a tier that doesn’t exist in the scenario.
Solution: Check that source_tier / destination_tier values match actual tier names in the scenario.
Unknown resource type¶
Error: Unknown resource type 'AWS::EC2::Instance' on ResourceName
Cause: Template contains unexpected resource types.
Solution: SG templates should only contain:
AWS::EC2::SecurityGroupAWS::EC2::SecurityGroupIngressAWS::EC2::SecurityGroupEgressAWS::SSM::Parameter
Docker Errors¶
Permission denied on volumes¶
Error: Permission denied: '/app/reports/...'
Cause: Volume mount permissions mismatch.
Solution: Ensure host directories exist and are writable:
mkdir -p sg/{configs,policies,reports,templates,schemas}
chmod 777 sg/reports sg/policies sg/templates
Config file not found¶
Error: Configuration file not found: /app/configs/my-config.yaml
Cause: Config file not in the mounted volume.
Solution: Verify the mount and filename:
ls $(pwd)/sg/configs/ # Should show your config file
docker run --rm \
-v $(pwd)/sg/configs:/app/configs:ro \
...
License validation failed¶
Error: License validation failed. Exiting.
Cause: No active AWS Marketplace subscription.
Solution: Subscribe to the SG Provisioner product on AWS Marketplace. Ensure the subscription is active and associated with the AWS account you are using.
Deletion Errors¶
Stack does not exist¶
Error: SG stack 'my-stack' does not exist
Cause: Stack was already deleted or never created.
Solution: No action needed — resources are already gone.
Delete failed — resources in use¶
Error: resource sg-xxx has a dependent object
Cause: Security group is attached to an ENI (EC2 instance, Lambda, RDS, etc.).
Solution:
Identify what’s using the SG:
aws ec2 describe-network-interfaces --filters Name=group-id,Values=sg-xxx --region us-west-2
Remove the SG from those resources first
Retry deletion
Force flag required¶
Error: Security group deletion requires --force flag
Solution: Add --force:
docker run --rm \
-v ~/.aws:/home/sguser/.aws:ro \
-v $(pwd)/sg/configs:/app/configs:ro \
-v $(pwd)/sg/reports:/app/reports \
sg-provisioner:latest \
-con my-config.yaml -act delete-security-groups --force
Drift Detection Issues¶
No deployed stack found¶
Error: No deployed stack found for drift detection
Cause: Stack hasn’t been deployed yet.
Solution: Deploy first with create-security-groups --force, then check drift.
Drift detection timed out¶
Error: Drift detection timed out
Cause: AWS took too long to complete drift analysis (>10 minutes).
Solution: Retry — this is usually transient. If persistent, check the CloudFormation console.
Drift detected¶
Message: DRIFT DETECTED in stack
Cause: Someone modified security group rules outside of CloudFormation.
Solution:
Review the drift details in the output
Either:
Update your config/scenario to match the desired state, regenerate template, and deploy
Or revert manual changes by redeploying:
create-security-groups --force(after deleting)
Parameter Store Issues¶
Parameter not found after deployment¶
Cause: Deployment succeeded but parameter storage failed.
Solution: Check manually:
aws ssm get-parameters-by-path --path /sg/my-sg-name/ --region us-west-2
If empty, the stack outputs may not have been stored. Check the deployment log in reports/.
Stale parameters after deletion¶
Cause: Stack deleted but parameter cleanup failed.
Solution: Delete manually:
aws ssm delete-parameter --name /sg/my-sg-name/web/SecurityGroupId --region us-west-2
aws ssm delete-parameter --name /sg/my-sg-name/app/SecurityGroupId --region us-west-2
aws ssm delete-parameter --name /sg/my-sg-name/db/SecurityGroupId --region us-west-2
Getting Help¶
Check logs in
reports/directory for detailed error informationUse
--verbose debugfor maximum logging detailReview the Configuration Reference for valid values
Review the Scenarios Reference for scenario structure