Performance Tuning Guide¶
Performance optimization strategies for VPC infrastructure provisioned by the VPC Provisioner.
Table of Contents¶
Network Performance¶
Bandwidth Limits¶
Resource |
Bandwidth |
Notes |
|---|---|---|
NAT Gateway |
100 Gbps |
Scales automatically |
VPC Peering |
No limit |
Uses AWS backbone |
Internet Gateway |
No limit |
Scales automatically |
EC2 Instance |
Varies by type |
ml.m5.xlarge = 10 Gbps |
S3 Gateway Endpoint |
No limit |
Free, no NAT overhead |
Latency Optimization¶
Traffic Path |
Typical Latency |
Optimization |
|---|---|---|
Same AZ |
< 1 ms |
Co-locate communicating resources |
Cross-AZ |
1-2 ms |
Use for HA, avoid for latency-sensitive |
To S3 (same region) |
1-5 ms |
Use VPC Gateway Endpoint |
To S3 via NAT |
5-15 ms |
Avoid — use VPC Endpoint instead |
Cross-region |
20-100 ms |
Use Transfer Acceleration for S3 |
NAT Gateway Performance¶
Throughput¶
NAT Gateway supports up to 100 Gbps burst. For sustained high throughput:
Each NAT Gateway supports 55,000 simultaneous connections to a single destination
900 connections per second to a single destination
If you exceed these limits, connections are dropped
High Availability Configuration¶
# Single NAT — all traffic through one gateway
vpc:
nat_gateway:
enabled: true
high_availability: false # Single point of failure
# HA NAT — one per AZ, traffic stays local
vpc:
nat_gateway:
enabled: true
high_availability: true # Better performance + resilience
HA NAT Gateways improve performance because traffic stays within the same AZ — no cross-AZ hop.
When to Avoid NAT Gateway¶
For AWS service traffic (S3, DynamoDB), use VPC Gateway Endpoints instead of NAT Gateway:
Lower latency (direct path vs NAT hop)
Higher throughput (no NAT bottleneck)
Zero cost (Gateway Endpoints are free)
Subnet and AZ Optimization¶
Co-Location Strategy¶
Place resources that communicate frequently in the same AZ:
us-west-2a:
private-app-subnet-1:
- SageMaker training instances
- Lambda inference functions
database-subnet-1:
- RDS primary instance
us-west-2b:
private-app-subnet-2:
- SageMaker endpoint instances (HA)
database-subnet-2:
- RDS standby instance (HA)
Subnet Sizing¶
Size subnets based on expected resource count:
Subnet CIDR |
Usable IPs |
Best For |
|---|---|---|
/24 |
251 |
Application subnets (EC2, ECS, Lambda) |
/26 |
59 |
Database subnets (RDS, ElastiCache) |
/28 |
11 |
Small utility subnets |
/20 |
4,091 |
Large-scale EKS or SageMaker workloads |
The provisioner defaults (/24 for app, /26 for database) work well for most ML workloads.
Multiple Private Subnets¶
For ML workloads that need isolation:
vpc:
subnets:
private:
- name: private-app-subnet-1 # Application tier
cidr: 10.0.11.0/24
az: us-west-2a
- name: private-ml-subnet-1 # ML training tier
cidr: 10.0.13.0/24
az: us-west-2a
Separate ML training from application workloads to prevent resource contention.
VPC Endpoint Performance¶
Gateway Endpoints (S3, DynamoDB)¶
Gateway Endpoints route traffic directly to the service without NAT:
Without Endpoint: Instance → NAT Gateway → Internet → S3
With Endpoint: Instance → VPC Endpoint → S3 (direct)
Performance improvement:
Latency: 50-70% reduction (no NAT hop)
Throughput: No NAT Gateway bottleneck
Cost: Free (no NAT data processing charges)
Configure via S3 Provisioner:
s3:
vpc_id: "vpc-0a1b2c3d4e5f6g7h8"
route_table_ids: "rtb-0a1b2c3d,rtb-4e5f6g7h"
Interface Endpoints (SageMaker, ECR, CloudWatch)¶
For other AWS services, consider Interface Endpoints:
SageMaker API and Runtime
ECR (for container image pulls)
CloudWatch Logs
SSM Parameter Store
These cost $0.01/hour per AZ but eliminate NAT dependency for AWS service traffic.
ML Workload Optimization¶
SageMaker Training in VPC¶
# Place training in private subnets with S3 VPC Endpoint
estimator = Estimator(
subnets=["subnet-private-app-1", "subnet-private-app-2"],
security_group_ids=["sg-ml-training"],
# S3 data access goes through VPC Endpoint — fast and free
)
SageMaker Distributed Training¶
For multi-instance training, use the same AZ to minimize inter-node latency:
estimator = Estimator(
instance_count=4,
instance_type="ml.p3.16xlarge",
subnets=["subnet-private-ml-1"], # Single AZ for low latency
)
Lambda in VPC¶
Lambda functions in VPC have cold start overhead (~1-2 seconds for VPC attachment). Mitigate with:
Provisioned Concurrency for latency-sensitive functions
Keep functions warm with scheduled invocations
Use smaller memory sizes for faster initialization
Provisioner Performance¶
Typical Operation Times¶
Action |
Typical Duration |
Notes |
|---|---|---|
validate-config |
< 1 second |
Local only |
create-policy |
< 1 second |
Local only |
create-prov-template |
< 1 second |
Local only |
validate-prov-template |
< 1 second |
Local only |
show-changes |
10-30 seconds |
Creates and deletes ChangeSet |
check-drift |
15-60 seconds |
Depends on resource count |
test-deploy |
90-180 seconds |
Full stack with NAT Gateways |
create-vpc |
90-180 seconds |
NAT Gateway creation is slowest |
delete-vpc |
60-120 seconds |
NAT Gateway deletion takes 5-10 min |
Why NAT Gateway Is Slow¶
NAT Gateway creation takes 2-5 minutes per gateway. With HA (2-3 AZs), this is the primary bottleneck in VPC deployment. This is an AWS limitation, not a provisioner limitation.
Monitoring Performance¶
VPC Flow Logs¶
Enable Flow Logs to analyze traffic patterns:
aws ec2 create-flow-log \
--resource-type VPC \
--resource-id vpc-0a1b2c3d4e5f6g7h8 \
--traffic-type ALL \
--log-destination-type cloud-watch-logs \
--log-group-name /vpc/edge-prod-b001-us-west-2-vpc/flow-logs
Analyze for:
High cross-AZ traffic (co-location opportunity)
Traffic to AWS services without VPC Endpoints
Rejected traffic (security group or NACL issues)
NAT Gateway Metrics¶
Monitor in CloudWatch:
BytesOutToDestination— throughputPacketsDropCount— capacity issues (consider HA)ConnectionAttemptCount— connection rateActiveConnectionCount— concurrent connections
Network Performance Testing¶
# Install iperf3 on two EC2 instances in different subnets
sudo apt install iperf3
# Server (instance in subnet A)
iperf3 -s
# Client (instance in subnet B)
iperf3 -c <server-private-ip> -t 30 -P 10
For cost optimization, see COST_OPTIMIZATION.md. For architecture patterns, see USER_GUIDE.md.