Cloud Native Architecture for the KCNA Exam: Observability, the CNCF Landscape, Service Mesh & Microservices

The KCNA exam rewards breadth over depth. You don’t need to write a Prometheus recording rule from memory — you need to recognize that “aggregated time-series of request latency” means metrics, that a sidecar handling mTLS between services means service mesh, and that a single trace ID stitching a request across ten microservices means distributed tracing. The Cloud Native Architecture domain is roughly 12% of the Kubernetes and Cloud Native Associate (KCNA) exam, but it’s the conceptual glue that ties the other domains together — and it’s where vocabulary matters most.

This guide is written from a practitioner’s perspective and organized the way the exam thinks: observability first (it carries the most question weight inside this domain), then the CNCF ecosystem, autoscaling, service mesh, and the microservices and cloud native principles that underpin everything. Every section flags the signal words that point you at the right answer under time pressure.

If you want the full exam picture first, start with the KCNA Exam Guide 2026 and the KCNA study guide, then come back here to go deep on architecture. For the foundational layer this domain sits on top of, the KCNA Kubernetes architecture fundamentals post covers the control plane and core objects.

What “Cloud Native” Actually Means

Before the tools, get the definition straight — the exam tests it directly. The CNCF’s own definition is worth internalizing: cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. The hallmarks are containers, service meshes, microservices, immutable infrastructure, and declarative APIs.

The recurring themes you should be able to recite:

Resilience — the system tolerates failure of individual components without taking down the whole.
Manageability — infrastructure and apps are described declaratively and changed through automation, not manual SSH.
Observability — you can understand the system’s internal state from its external outputs.
Loose coupling — services interact through well-defined APIs, so they can be deployed and scaled independently.

These four words are the lens the rest of this domain looks through. When a question describes a benefit (“teams can deploy independently,” “a failed pod is replaced automatically”), it’s testing whether you can map the scenario to one of these principles.

Observability: The Three Pillars

Observability is the single most-tested concept in this domain. The classic framing is the three pillars: metrics, logs, and traces. Each answers a different question.

Pillar	Question it answers	Data shape	Canonical CNCF tool
Metrics	”Is something wrong, and how much?”	Numeric time-series (counters, gauges, histograms)	Prometheus
Logs	”What exactly happened?”	Timestamped, often unstructured text events	Fluentd / Fluent Bit + Loki
Traces	”Where in the request path did it happen?”	Spans linked by a trace ID across services	Jaeger, OpenTelemetry

A useful mental shortcut: metrics tell you that something broke, logs tell you what broke, traces tell you where it broke. Many exam questions are just that mapping in disguise.

Metrics and Prometheus

Prometheus is the de facto cloud native metrics system and a graduated CNCF project. Know these facts:

It uses a pull model — Prometheus scrapes HTTP /metrics endpoints on a schedule, rather than receiving pushes.
Metrics are time-series identified by a name plus key/value labels (e.g., http_requests_total{method="GET", status="200"}).
You query with PromQL.
The four core metric types are counter (only goes up), gauge (up and down), histogram, and summary.

A counter you’d recognize on the exam:

# Per-second request rate over the last 5 minutes
rate(http_requests_total{job="api"}[5m])

Prometheus scrapes; it does not draw dashboards. That’s the next tool.

Visualization and Grafana

Grafana is the visualization layer. It queries data sources (Prometheus chief among them) and renders dashboards, panels, and alerts. The signal word “dashboard” almost always points at Grafana. Prometheus stores and queries; Grafana displays.

Logging and the EFK/Loki Stacks

Logs are collected by a node-level agent and shipped to a backend:

Fluentd and its lightweight sibling Fluent Bit are the CNCF log collectors/forwarders, usually deployed as a DaemonSet (one collector per node).
The backend is commonly Elasticsearch (the “EFK” stack: Elasticsearch + Fluentd + Kibana) or Loki (Grafana’s log store, queried with LogQL).

Signal: “collect logs from every node and forward them” → a DaemonSet running Fluent Bit. If you’re shaky on why a per-node agent is a DaemonSet, the workloads side is covered in the broader Kubernetes scheduling material.

Tracing, Jaeger, and OpenTelemetry

In a microservices system, a single user request fans out across many services. Distributed tracing stitches that journey together:

A trace represents one request end-to-end; it’s composed of spans, each a timed unit of work in one service.
A trace ID propagates through HTTP/gRPC headers so every service tags its span with the same ID.
Jaeger is the graduated CNCF tracing backend that stores and visualizes these traces.

OpenTelemetry (OTel) is the most important name in modern observability and a frequent exam target. It is a CNCF project that provides vendor-neutral APIs, SDKs, and a collector for generating and exporting all three signals — metrics, logs, and traces. The key idea: you instrument your code once with OpenTelemetry, then export to any backend (Prometheus, Jaeger, a commercial vendor) without re-instrumenting. If a question mentions “vendor-neutral instrumentation” or “a single standard for telemetry,” the answer is OpenTelemetry.

[ app + OTel SDK ] --> [ OTel Collector ] --> Prometheus (metrics)
                                          --> Jaeger     (traces)
                                          --> Loki       (logs)

The CNCF Landscape and Project Maturity

The KCNA explicitly covers the CNCF ecosystem. You won’t be asked to name all 150+ projects, but you must understand the maturity levels, because they appear as direct questions.

Maturity level	Meaning	Examples
Sandbox	Early-stage, experimental, encourages innovation	Newer/emerging projects
Incubating	Growing adoption, used in production by several orgs	Many mid-stage projects
Graduated	Mature, widely adopted, strong governance	Kubernetes, Prometheus, Envoy, Helm, etcd, Containerd, Fluentd, Jaeger, CoreDNS

Other facts worth memorizing:

The CNCF is part of the Linux Foundation and is the vendor-neutral home for cloud native projects.
Kubernetes was the first CNCF project and the first to graduate.
The landscape (landscape.cncf.io) organizes projects into categories: orchestration, observability, networking, storage, security, CI/CD, and more.
Projects are donated by companies and the community; CNCF provides neutral governance, not ownership by any single vendor.

Autoscaling: HPA, VPA, and Cluster Autoscaler

Scalability is a core cloud native promise, and KCNA tests the three autoscalers by name. Know what each one scales:

Autoscaler	What it changes	Trigger
Horizontal Pod Autoscaler (HPA)	Number of pod replicas	CPU/memory utilization or custom metrics
Vertical Pod Autoscaler (VPA)	CPU/memory requests of pods	Observed resource usage over time
Cluster Autoscaler	Number of nodes	Pending pods that can’t be scheduled

The exam loves the distinction between horizontal (more copies) and vertical (bigger copies) scaling. A quick HPA you’d recognize:

kubectl autoscale deployment web --cpu-percent=70 --min=2 --max=10

Signal mapping: “add more pods when CPU is high” → HPA; “right-size a pod’s requests” → VPA; “add a node because a pod is Pending for lack of capacity” → Cluster Autoscaler. A newer name that sometimes appears is KEDA (Kubernetes Event-Driven Autoscaling), which scales on event sources like queue length — including scaling to zero.

Service Mesh

A service mesh manages service-to-service communication without changing application code. It does this with a sidecar proxy (most commonly Envoy, a graduated CNCF project) injected next to each application container. The mesh’s control plane configures all the sidecars.

What a mesh gives you — and the exam phrases these as benefits:

mTLS (mutual TLS) for automatic, encrypted service-to-service authentication.
Traffic management: canary releases, traffic splitting, retries, timeouts, circuit breaking.
Observability: the sidecars emit consistent metrics, logs, and traces for every call — “free” golden-signal telemetry.

The two-plane model to remember:

Plane	Role	Examples
Data plane	The sidecar proxies that carry actual traffic	Envoy
Control plane	Configures and coordinates the proxies	Istio, Linkerd

Signal words: “sidecar,” “mTLS between services,” and “manage traffic without changing app code” all point at a service mesh. Note the contrast with an Ingress controller, which handles north-south (external→cluster) traffic, whereas a mesh handles east-west (service↔service) traffic.

Microservices and Application Architecture Patterns

The final thread in this domain is the architectural style cloud native assumes: microservices. Be able to compare it against the monolith.

Aspect	Monolith	Microservices
Deployment unit	One large application	Many small, independent services
Scaling	Scale the whole app	Scale individual services
Coupling	Tight	Loose, via APIs
Team autonomy	Limited	High — teams own services end-to-end
Failure blast radius	Whole app	Ideally one service
Complexity cost	Lower operationally	Higher — networking, observability, data consistency

Patterns the exam may name in passing:

API Gateway — a single entry point that routes to backend services.
Sidecar pattern — a helper container (logging agent, proxy) sharing a pod with the main app.
Twelve-Factor App — a set of principles (config in the environment, stateless processes, disposability) that make services cloud native.
Immutable infrastructure — you replace, not patch; a new image is rolled out rather than mutating a running container.

A balanced takeaway the exam expects: microservices buy independent deployability and scalability at the cost of operational complexity — which is precisely why the observability, service mesh, and autoscaling tooling above exists. The architecture and the tooling are two halves of the same story.

KCNA Cloud Native Architecture Cheat Sheet

When a question describes a need, map it fast:

Scenario / signal words	Answer
Numeric time-series, “scrape,” pull model	Prometheus (metrics)
“Dashboard,” visualization	Grafana
Per-node log collection, “forward logs”	Fluentd / Fluent Bit (DaemonSet)
“Trace ID across services,” spans	Distributed tracing / Jaeger
”Vendor-neutral instrumentation,” one standard for telemetry	OpenTelemetry
”Sidecar,” mTLS, east-west traffic	Service mesh (Istio/Linkerd, Envoy data plane)
Add more pods on high CPU	HPA
Right-size pod requests	VPA
Add nodes for Pending pods	Cluster Autoscaler
Mature, widely adopted CNCF project	Graduated
Vendor-neutral home for cloud native projects	CNCF (Linux Foundation)

Exam-Day Strategy for This Domain

It’s multiple choice, not hands-on. KCNA is a 90-minute, ~60-question MCQ exam — there’s no terminal. Speed comes from recognizing vocabulary, not typing commands. The KCNA exam format post breaks down the structure.
Anchor on signal words. Most architecture questions hinge on one term — “sidecar,” “graduated,” “trace,” “pull-based.” Train yourself to spot it and the answer usually follows.
Don’t over-engineer. When two answers seem plausible, the simpler cloud native-aligned choice (declarative, loosely coupled, automated) is usually right.
Mind the overlaps. Observability concepts reappear under Kubernetes Fundamentals and Container Orchestration; the KCNA practice questions set will show you how the same idea gets asked in different domains.

Practice Until the Vocabulary Is Automatic

Cloud Native Architecture is a recognition skill: the difference between a 70% and a 95% here is how quickly “mTLS sidecar” maps to “service mesh” without second-guessing. The only reliable way to build that reflex is repeated exposure to exam-style questions across all the CNCF concepts — Prometheus vs. Grafana, the three autoscalers, project maturity levels, and the three pillars of observability.

Sailor.sh’s Kubernetes and Cloud Native Associate (KCNA) Mock Exam Bundle gives you full-length, timed mock exams with detailed explanations that mirror the real exam’s format and difficulty, including the architecture and observability scenarios covered here. Pair the mocks with a structured plan from the KCNA study guide, and if you’re weighing the credential against the CKA, the KCNA vs CKA comparison and is KCNA worth it? posts will help you place it in your roadmap. For the security-flavored sibling of this domain, the 4Cs of cloud native security covers the KCSA angle.

Frequently Asked Questions

What are the three pillars of observability?

Metrics, logs, and traces. Metrics are numeric time-series that tell you that something is wrong (Prometheus). Logs are timestamped event records that tell you what happened (Fluentd/Loki). Traces follow a single request across services to tell you where a problem occurred (Jaeger). OpenTelemetry is the vendor-neutral standard for generating all three.

What’s the difference between Prometheus and Grafana?

Prometheus collects and stores metrics by scraping /metrics endpoints and lets you query them with PromQL. Grafana is the visualization layer — it queries Prometheus (and other sources) and renders dashboards and alerts. Prometheus stores and queries; Grafana displays. They’re commonly used together.

What is a service mesh and when do you need one?

A service mesh manages service-to-service (east-west) communication using sidecar proxies (typically Envoy) injected next to each app, coordinated by a control plane like Istio or Linkerd. It provides mTLS encryption, traffic management (canary, retries, timeouts), and uniform observability — all without changing application code. You reach for it when you have many microservices and need consistent security and traffic control across them.

What’s the difference between HPA, VPA, and the Cluster Autoscaler?

HPA changes the number of pod replicas based on CPU/memory or custom metrics. VPA changes the CPU/memory requests of individual pods. The Cluster Autoscaler changes the number of nodes in the cluster, adding nodes when pods can’t be scheduled and removing underused ones. Horizontal means more copies; vertical means bigger copies.

What do CNCF maturity levels (Sandbox, Incubating, Graduated) mean?

They signal project maturity. Sandbox projects are early-stage and experimental. Incubating projects have growing production adoption. Graduated projects (like Kubernetes, Prometheus, Envoy, and Helm) are mature, widely adopted, and have strong governance. The level often appears as a direct KCNA question, so memorize a few examples of each.

How much of the KCNA exam is Cloud Native Architecture?

About 12% of the exam. It’s the smallest domain by weight, but its concepts — observability, the CNCF landscape, autoscaling, and service mesh — overlap with Container Orchestration and Application Delivery, so the effective footprint is larger. See the KCNA exam guide for the full domain weighting.