Introduction
Most candidates pour their energy into the resilient and secure domains and treat performance as an afterthought. That’s a mistake. Design High-Performing Architectures is worth roughly 24% of the AWS Certified Solutions Architect - Associate (SAA-C03) exam — nearly a quarter of your score and the second-largest domain. AWS wants to know whether you can choose the right compute, storage, database, caching, and networking option for a given workload, and scale it elastically without over-provisioning.
The challenge is breadth. “Performance” on AWS touches almost every service family: which EBS volume type for a database, when ElastiCache beats a read replica, why CloudFront and Global Accelerator both speed things up but solve different problems. The exam rarely asks you to define a term — it hands you a scenario and four plausible services and asks for the most performant (and often most cost-effective) fit.
This guide is a practitioner’s tour of the domain, organized the way the exam thinks: performant compute, storage, databases, caching, and networking, each with the distinctions and exam cues that separate the right answer from a tempting wrong one. If you need the big picture first, start with the AWS Solutions Architect Associate Guide 2026 and the exam domains strategy, then come back here to go deep on performance.
What “High-Performing” Means on AWS
Before the services, internalize the mindset. A high-performing architecture on AWS is one that:
- Selects the right resource type and size for the workload (compute, storage, database).
- Scales elastically so performance holds as load changes — out and in, not just up.
- Uses caching to cut latency and offload origins.
- Decouples components so a slow tier doesn’t drag down the rest.
- Monitors and right-sizes continuously instead of guessing.
The exam rewards picking the managed, purpose-built service over a hand-rolled one, and the option that scales automatically over one you must babysit. Keep those instincts handy as we go.
Performant Compute
Compute performance is about matching the instance or service to the workload and letting it scale.
EC2 instance families are purpose-built — knowing the letter prefixes answers a surprising number of questions:
| Family | Optimized for | Typical use |
|---|---|---|
| T / M | General purpose (T = burstable) | Web servers, small/medium apps |
| C | Compute | Batch processing, HPC, gaming servers |
| R / X | Memory | In-memory databases, large caches, analytics |
| I / D | Storage (high local I/O) | NoSQL databases, data warehousing |
| P / G / Inf | Accelerated (GPU/ML) | Machine learning, rendering |
Exam cue: “memory-intensive in-memory database” → R family; “high-performance compute / batch” → C family; “high local disk throughput” → I family with instance store.
Beyond instance choice:
- Auto Scaling keeps performance steady under variable load. Target-tracking on a metric like CPU or request count is the default. (The resilient architectures guide covers Auto Scaling policies in depth.)
- AWS Lambda and Fargate remove instance management entirely — strong answers when the scenario stresses “no servers to manage” and spiky or event-driven load.
- Placement groups tune EC2 networking: a cluster placement group packs instances close for low-latency, high-throughput HPC; a spread group separates them for resilience; a partition group suits large distributed systems like Hadoop.
Performant Storage: Choosing the Right Volume and Service
Storage is one of the densest exam areas. Start with EBS volume types, a perennial question:
| Volume type | Category | Best for |
|---|---|---|
| gp3 / gp2 | General purpose SSD | Boot volumes, most workloads (gp3 lets you provision IOPS/throughput independently) |
| io2 / io1 | Provisioned IOPS SSD | I/O-intensive databases needing sustained high IOPS and low latency |
| st1 | Throughput-optimized HDD | Big, sequential workloads — big data, log processing, data warehouses |
| sc1 | Cold HDD | Infrequently accessed, lowest cost |
Exam cues: “highest IOPS for a critical database” → io2/io1; “throughput for large sequential reads, low cost” → st1; “general workload, cost-effective default” → gp3.
Then the file and object stores:
- Instance store — physical disk attached to the host, ephemeral but the fastest, lowest-latency option. Choose it for high-IOPS scratch/cache data you can afford to lose.
- Amazon EFS — managed, elastic NFS shared across many EC2 instances and AZs. The answer when “multiple instances need shared file access.”
- Amazon FSx — purpose-built file systems: FSx for Windows (SMB) and FSx for Lustre (high-performance computing, tightly integrated with S3).
- Amazon S3 — massively scalable object storage. For performance, know S3 Transfer Acceleration (faster long-distance uploads via CloudFront edge locations) and that S3 now scales to very high request rates per prefix automatically. See the S3 complete guide for SAA-C03.
Caching: The Performance Multiplier
Caching is the highest-leverage performance tool, and the exam tests it heavily. The key is knowing which cache for which layer:
| Service | Caches | Use it when |
|---|---|---|
| Amazon CloudFront | Static & dynamic content at edge locations | Global users; reduce latency and origin load for websites, APIs, media |
| Amazon ElastiCache | Database query results / session data (in-memory) | Offload read-heavy databases; sub-millisecond data access |
| DynamoDB Accelerator (DAX) | DynamoDB reads | Microsecond reads for a read-heavy DynamoDB table |
| API Gateway caching | API responses | Reduce calls to backends behind API Gateway |
ElastiCache itself has two engines you must distinguish:
| Redis | Memcached | |
|---|---|---|
| Data structures | Rich (lists, sets, sorted sets) | Simple key/value |
| Persistence & replication | Yes — backups, read replicas, Multi-AZ failover | No persistence, no replication |
| Use when | Need HA, persistence, pub/sub, or complex types | Need a simple, horizontally scalable cache |
Exam cue: “cache must survive a node failure / needs replication” → Redis; “simple, scale-out object cache, no persistence needed” → Memcached; “speed up a read-heavy DynamoDB table without changing app logic” → DAX; “serve static and dynamic content to a global audience with low latency” → CloudFront.
A classic pattern worth memorizing: a read-heavy relational workload uses ElastiCache as a lazy-loading cache in front of RDS so repeated queries hit memory instead of the database.
Performant Databases
Database performance questions usually come down to scaling reads and choosing the right engine.
- RDS Read Replicas scale read traffic by serving queries from asynchronous copies of the primary — up to 15 with Aurora, 5 with standard RDS. Reach for them when “read queries are overwhelming the database.” (Don’t confuse them with Multi-AZ, which is for availability — the resilient architectures guide draws that line.)
- Amazon Aurora delivers several times the throughput of standard MySQL/PostgreSQL, auto-scales storage, and supports up to 15 low-latency replicas. Aurora Serverless scales capacity automatically for variable or unpredictable workloads — a strong answer when load is spiky and you don’t want to manage capacity.
- Amazon DynamoDB offers single-digit-millisecond performance at any scale; pair it with DAX for microsecond reads. On-demand capacity mode handles unpredictable traffic without provisioning.
- RDS Proxy pools and shares database connections, improving performance and resilience for applications (especially serverless) that open many short-lived connections.
Exam cue: “offload reads from the primary” → read replicas; “millisecond key-value at massive scale” → DynamoDB; “thousands of Lambda functions exhausting DB connections” → RDS Proxy; “variable, unpredictable relational load, hands-off scaling” → Aurora Serverless.
Performant Networking & Content Delivery
Getting bytes to users quickly is its own sub-domain, and two services are routinely confused:
| Amazon CloudFront | AWS Global Accelerator | |
|---|---|---|
| What it does | CDN — caches content at edge locations | Routes traffic over the AWS backbone to the optimal endpoint |
| Caching? | Yes | No (it accelerates, doesn’t cache) |
| Best for | Cacheable web content, media, APIs | Non-cacheable traffic (TCP/UDP), gaming, VoIP, fast regional failover |
| IP addresses | Edge-resolved DNS | Two static anycast IPs |
Exam cue: “cache and deliver static/dynamic web content globally” → CloudFront; “improve performance of a non-HTTP/non-cacheable global application, or need static anycast IPs and instant failover” → Global Accelerator.
Other networking performance levers:
- Enhanced networking (ENA) and Elastic Fabric Adapter (EFA) for high-throughput, low-latency EC2 networking (HPC, ML).
- VPC endpoints keep traffic to AWS services (S3, DynamoDB) on the private AWS network, avoiding the public internet. See the VPC concepts guide.
- AWS Direct Connect provides a dedicated, consistent-bandwidth link from on-premises to AWS when VPN performance is insufficient.
Decoupling for Performance & Throughput
Decoupling isn’t only about resilience — it’s a performance pattern. Putting a queue or stream between a fast producer and a slower consumer lets each scale independently and absorbs spikes:
- Amazon SQS buffers requests so a backend can process at its own pace instead of being overwhelmed — the producer never blocks.
- Amazon Kinesis Data Streams ingests high-volume, real-time streaming data (clickstreams, telemetry, logs) for downstream processing.
Exam cue: “smooth out a traffic spike so the backend isn’t overwhelmed” → SQS buffering; “ingest and process millions of real-time events per second” → Kinesis. The decoupling guide for SAA-C03 covers these messaging patterns in detail.
A Worked Mental Model
Picture a read-heavy global web application the exam might hand you and how each performance lever fits:
- Edge: CloudFront caches static assets and accelerates dynamic content for users worldwide.
- Compute: EC2 in an Auto Scaling Group (right-sized instance family) or Lambda/Fargate for spiky, event-driven pieces.
- Caching layer: ElastiCache (Redis) in front of the database absorbs repeated reads at sub-millisecond latency.
- Database: Aurora with read replicas for relational reads, or DynamoDB + DAX for key-value access at scale.
- Storage: gp3 for general volumes, io2 for the I/O-hungry database, S3 (with Transfer Acceleration) for objects.
- Decoupling: SQS buffers write spikes so the backend scales smoothly.
If you can assemble that from a requirements paragraph and justify each choice on performance grounds — while keeping cost reasonable — you’ve mastered the domain. The Well-Architected Framework, whose Performance Efficiency pillar underpins this entire domain, is worth a parallel read.
Practice in Realistic Exam Conditions
High-performing architecture questions are scenario-heavy: a paragraph of requirements, four plausible services, and one best answer that balances performance against cost. The only way to get fast is to drill realistic questions until “read-heavy DynamoDB” instantly maps to DAX and “global cacheable content” instantly maps to CloudFront.
Sailor.sh’s AWS Certified Solutions Architect - Associate (SAA-C03) Mock Exam Bundle gives you exam-style questions that mirror the real format and difficulty — including the performance scenarios covered here — with detailed explanations for every answer. Working through them is the most efficient way to turn this knowledge into the fast, reflexive pattern-matching the exam demands.
Pair the practice with a structured plan like the AWS Solutions Architect study plan, review the full exam topics list, and shore up the adjacent resilient architectures domain so the two largest sections of the exam are both solid.
Conclusion
The High-Performing Architectures domain is nearly a quarter of the SAA-C03, and it rewards breadth applied with judgment. You don’t need to memorize every spec sheet — you need to match a workload to the right compute family, the right EBS volume, the right cache, the right database scaling strategy, and the right content-delivery service, then let it scale elastically. Learn the key distinctions that the exam loves to blur — Redis vs Memcached, CloudFront vs Global Accelerator, read replicas vs Multi-AZ, io2 vs st1 — and practice mapping scenarios to services until it’s reflexive. Do that, and the performance questions become some of the most predictable points on the exam. Combine it with the rest of the SAA-C03 curriculum and you’ll walk in ready.
Frequently Asked Questions
How much of the SAA-C03 exam is the high-performing architectures domain?
“Design High-Performing Architectures” is about 24% of the SAA-C03 score — the second-largest domain after resilient architectures. It spans compute, storage, databases, caching, and networking, so it’s worth deep preparation. See the exam domains strategy for the full weighting.
When should I use ElastiCache Redis versus Memcached?
Choose Redis when you need persistence, replication, Multi-AZ failover, pub/sub, or rich data structures (lists, sets, sorted sets). Choose Memcached for a simple, multi-threaded, horizontally scalable key/value cache with no persistence or replication requirement.
What’s the difference between CloudFront and Global Accelerator?
CloudFront is a CDN that caches content at edge locations — ideal for static and dynamic web content and media. Global Accelerator doesn’t cache; it routes traffic over the AWS global backbone to the optimal endpoint and provides two static anycast IPs — ideal for non-cacheable TCP/UDP applications, gaming, VoIP, and fast regional failover.
How do read replicas improve performance?
RDS and Aurora read replicas are asynchronous copies of the primary database that serve read queries, offloading the primary so it can focus on writes. They scale read traffic; they are not a high-availability mechanism (that’s Multi-AZ). Aurora supports up to 15 replicas with very low replication lag.
Which EBS volume type should I choose on the exam?
Use gp3 as the cost-effective general-purpose default; io2/io1 (Provisioned IOPS) for I/O-intensive databases needing sustained high IOPS and low latency; st1 for large sequential, throughput-heavy workloads like big data; and sc1 for cold, infrequently accessed data at the lowest cost.
What is DynamoDB DAX and when do I use it?
DynamoDB Accelerator (DAX) is a fully managed, in-memory cache for DynamoDB that delivers microsecond read latency for read-heavy workloads — without changing your application’s DynamoDB API calls. Use it when a DynamoDB table is read-intensive and single-digit-millisecond latency isn’t fast enough.