← All posts

What are the most common AWS architecture mistakes (and how do you catch them)?

The most common AWS architecture mistakes — public databases, single-AZ deployments, API Gateway wired to RDS, misplaced NAT gateways, managed services inside a VPC — why each is wrong, and how to catch them in the diagram before they ship.

Quick answer

The most common AWS architecture mistakes are exposing a database to the internet or a public subnet, wiring API Gateway directly to a relational database, running in a single Availability Zone, placing a NAT gateway in a private subnet, and nesting regional managed services like S3 or DynamoDB inside a VPC. Each one is visible in the architecture diagram before it ever reaches production.

Most AWS architecture mistakes are not subtle. A database in a public subnet, a single Availability Zone behind a production load balancer, a NAT gateway that can never reach the internet — these are textbook anti-patterns, and they are all visible in the architecture diagram long before any infrastructure is provisioned.

The problem is that diagrams are reviewed by eye. A reviewer who knows AWS well will catch a public database; a reviewer in a hurry, or one less familiar with the service, will not. And AI assistants make this worse: ask ChatGPT, Claude, or Gemini for an AWS diagram and they will happily reproduce the same anti-patterns with total confidence — see our guide on getting LLMs to generate valid Mermaid AWS diagrams at /blog/how-to-prompt-llms-for-mermaid-aws-diagrams.

This post walks through the most common AWS architecture mistakes, why each one is wrong, the pattern that triggers it, and the fix. Every example is written in Mermaid architecture-beta syntax — if you are new to it, start with how to draw AWS architecture diagrams in Mermaid at /blog/how-to-draw-aws-architecture-diagrams-in-mermaid. The point is not just to list mistakes, but to show that each maps to a rule you can validate against automatically.

Why is the architecture diagram the cheapest place to catch a mistake?

The architecture diagram is the earliest concrete artifact in a system's life. It exists before the Terraform, before the first deploy, before anything is running and costing money. A mistake fixed in the diagram costs a sentence in a review comment. The same mistake fixed after it reaches production costs a migration, a maintenance window, and sometimes an incident.

The catch is that diagram review is manual and inconsistent. Two reviewers will flag two different sets of issues, and neither will be exhaustive. What is missing is the same thing that transformed code review years ago: a linter — something that applies the same rules every time, explains each finding, and never gets tired on the fortieth diagram of the week.

That is what LatixEngine does for architecture. It reads the diagram and checks it against rules derived from official AWS documentation and the Well-Architected Framework, so the easy, repeatable findings are caught automatically and human reviewers can focus on the judgment calls.

Is it a mistake to put a database in a public subnet?

Yes — and it is the single most common serious mistake in AWS diagrams. A database in a public subnet, or one marked publicly accessible, can be reached from the internet. That exposes it to credential-stuffing, brute-force attempts, and exploitation of any unpatched engine vulnerability.

The rule is simple: data stores belong in a private subnet, with no public IP and a security group that only accepts connections from the application tier. The database is never a direct destination from the Internet Gateway.

architecture-beta
    group region(aws:region)[AWS Region]
        group vpc(aws:virtual-private-cloud-vpc)[VPC] in region
            group public_subnet(aws:public-subnet)[Public Subnet] in vpc
                service app(aws:ec2-instance-contents)[App Server] in public_subnet
            group private_subnet(aws:private-subnet)[Private Subnet] in vpc
                service db(aws:arch-amazon-rds)[RDS] in private_subnet
    app:B --> T:db

The RDS instance sits in the private subnet; only the application server reaches it. If your diagram has a database in a public subnet, or a line drawn straight from an Internet Gateway to a database, that is the mistake to fix first.

Can API Gateway connect directly to a database?

Not to a relational database. API Gateway integrates over HTTP and with AWS services that expose an API — Lambda, Step Functions, SQS, SNS, S3, and DynamoDB. It has no integration type that can open a connection to RDS, run SQL, and manage a connection pool. A line drawn directly from API Gateway to RDS is invalid; the path needs a Lambda function or a container backend in between.

architecture-beta
    service api(aws:arch-amazon-api-gateway)[API Gateway]
    service fn(aws:arch-aws-lambda)[Lambda]
    service db(aws:arch-amazon-rds)[RDS]
    api:R --> L:fn
    fn:R --> L:db
DynamoDB is the exception

API Gateway can connect directly to DynamoDB through an AWS service integration, because DynamoDB exposes a signed HTTP API — no Lambda required. So a direct API-Gateway-to-DynamoDB edge is correct, while a direct API-Gateway-to-RDS edge is a mistake. A good validation rule has to know the difference; a naive "API Gateway must not touch a database" rule would produce false positives.

This is exactly the kind of mistake AI-generated diagrams make, because the visual shape of "gateway to database" looks plausible. The wire protocol is what makes it wrong.

Why is a single-Availability-Zone deployment a reliability risk?

An Availability Zone is a failure domain. If your production workload runs in one AZ, the loss of that AZ takes the whole system down — and AZ-level events do happen. The Well-Architected reliability pillar calls for spreading across at least two.

In a diagram, the tell is a single subnet holding everything, or an RDS instance with no standby. The fix is to place subnets in at least two AZs and let the load balancer and database span them.

architecture-beta
    group region(aws:region)[us-east-1]
        group vpc(aws:virtual-private-cloud-vpc)[VPC] in region
            group az_a(aws:public-subnet)[AZ a] in vpc
                service web_a(aws:ec2-instance-contents)[Web a] in az_a
            group az_b(aws:public-subnet)[AZ b] in vpc
                service web_b(aws:ec2-instance-contents)[Web b] in az_b

Two subnets, two zones, two web servers. For the data tier, the equivalent is enabling RDS Multi-AZ so a standby exists in a second zone.

Where should a NAT gateway go — public or private subnet?

A NAT gateway belongs in a public subnet. It needs a route to the Internet Gateway to provide outbound internet access to private resources. Private subnets then send their 0.0.0.0/0 traffic to the NAT, not to the Internet Gateway directly.

The common mistake is placing the NAT gateway inside a private subnet. With no path to the Internet Gateway, it cannot do its job — private resources lose outbound connectivity entirely.

architecture-beta
    group vpc(aws:virtual-private-cloud-vpc)[VPC]
        group public_subnet(aws:public-subnet)[Public Subnet] in vpc
            service igw(aws:res-amazon-vpc-internet-gateway)[Internet Gateway] in public_subnet
            service nat(aws:res-amazon-vpc-nat-gateway)[NAT Gateway] in public_subnet
        group private_subnet(aws:private-subnet)[Private Subnet] in vpc
            service worker(aws:ec2-instance-contents)[Worker] in private_subnet
    igw:R --> L:nat
    nat:B --> T:worker

NAT in the public subnet, the private worker routing outbound through it. If your diagram shows a NAT gateway inside a private subnet, that is a broken network path.

Should managed services like S3, DynamoDB, or SSM live inside a VPC?

No. S3, DynamoDB, and SSM Parameter Store are regional managed services. They live outside the VPC boundary and are reached over the public AWS API. Drawing them nested inside a subnet reflects a wrong mental model — they are not VPC resources and do not have a place in your subnet topology.

If you need traffic to those services to stay off the public internet, the correct construct is a VPC endpoint: a Gateway endpoint for S3 and DynamoDB, and an Interface endpoint (PrivateLink) for most others. The endpoint lives in your VPC; the service does not.

This is a placement rule, and it generalizes: a managed, regional service drawn inside a VPC or subnet is almost always a mistake, regardless of which service it is.

What can an AWS WAF actually be attached to?

AWS WAF is not a box that traffic flows through — it is a policy attached to a front-door resource and evaluated when a request arrives. It attaches to a fixed, closed set: CloudFront, Application Load Balancer, API Gateway, AppSync, Cognito user pools, App Runner, and Verified Access.

It does not attach to a Network Load Balancer (which operates at layer 4, below WAF's layer-7 inspection), and it does not attach directly to backends such as EC2, Lambda, or RDS. The frequent mistakes are a WAF drawn in front of an NLB, or a WAF wired straight to a backend service.

Shield follows the same pattern

AWS Shield is the same kind of attached protection — it guards the resource, it is not a hop. WAF and Shield are peers layered on the same front door (Shield for layer 3/4 DDoS, WAF for layer 7), so both point at the protected resource and neither connects to the other. Their valid-target lists differ: Shield also covers NLB, Route 53, Global Accelerator, and Elastic IPs, which WAF does not.

How do you catch these mistakes automatically instead of by eye?

Every mistake above has the same shape: a published AWS best practice, a recognizable pattern that violates it, and a concrete fix. That structure is exactly what a rule encodes. LatixEngine ships a growing library of these rules, each one carrying an explanation and a fix hint, grouped by category — security, reliability, connectivity, placement.

You draw the architecture; the engine reads it and reports the violations, the same way every time. The rules are provider-aware, so an AWS diagram is checked against AWS rules, and the same diagram can mix AWS, Azure, and GCP. The validation engine — and its rule library — is the core of the product; Mermaid is simply the first surface it runs on.

The deeper point is that correctness is not about whether the diagram renders. It is about whether the architecture it describes is sound. A diagram can be perfectly drawn and still describe a publicly exposed database. Validation closes that gap.

Quick reference: the most common AWS mistakes and how to fix them

MistakeWhy it is wrongFixCategory
Database in a public subnetReachable from the internetMove to a private subnet, no public IPSecurity
API Gateway wired to RDSNo integration for DB wire protocolsAdd a Lambda or container backendConnectivity
Single Availability ZoneOne failure domain takes the system downSpan at least two AZs, enable Multi-AZReliability
NAT gateway in a private subnetNo route to the Internet GatewayPlace NAT in a public subnetConnectivity
S3 / DynamoDB / SSM inside a VPCRegional managed services live outside the VPCModel outside the VPC; add a VPC endpointPlacement
WAF on an NLB or a backendWAF only attaches to layer-7 front doorsAttach to CloudFront, ALB, or API GatewaySecurity

Want to share a validated diagram in your docs once it is clean? See how to render and embed Mermaid AWS diagrams in GitHub, Notion, and Confluence at /blog/render-embed-mermaid-aws-diagrams-github-notion-confluence.

Frequently asked questions

What is the most common AWS security mistake in architecture diagrams?

Putting a database in a public subnet or marking it publicly accessible. The fix is to place RDS, Aurora, and other data stores in a private subnet with no public IP and a security group that only allows traffic from the application tier.

Can API Gateway connect directly to DynamoDB?

Yes. API Gateway can integrate directly with DynamoDB through an AWS service integration, because DynamoDB exposes an HTTP API. It cannot integrate directly with RDS, because relational databases speak a TCP wire protocol — that path needs a Lambda or container backend in between.

Should a database be in a public or private subnet?

Private. A database should sit in a private subnet with no route to an Internet Gateway and no public IP. Application servers in a public or private subnet connect to it over the VPC; the database itself is never reachable from the internet.

Do managed services like S3 and DynamoDB belong inside a VPC?

No. S3, DynamoDB, and SSM Parameter Store are regional managed services that live outside the VPC boundary and are reached over the AWS API. Drawing them inside a subnet is a modeling mistake. To keep traffic private, add a VPC endpoint — a Gateway endpoint for S3 and DynamoDB, an Interface endpoint for most others.

What resources can AWS WAF be attached to?

AWS WAF attaches to CloudFront, Application Load Balancer, API Gateway, AppSync, Cognito user pools, App Runner, and Verified Access. It cannot attach to a Network Load Balancer (which is layer 4) or directly to backends such as EC2, Lambda, or RDS. WAF is an attached policy, not a hop in the traffic path.

How does LatixEngine validate an AWS architecture diagram?

LatixEngine reads the architecture-beta diagram you draw and checks it against a library of rules derived from official AWS guidance and the Well-Architected Framework. Each rule reports what is wrong, why it matters, and how to fix it. Paste a diagram into the editor at latixengine.com/editor to see the violations inline.

Try it in the editor

Paste any example in this post into the LatixEngine editor to render it with native cloud icons and validate it against AWS, Azure, and GCP best practices. No login, no install.

Open the editor →