Microservices architecture questions probe your ability to design, build, and operate distributed systems at scale. This guide covers the full interview landscape — from core architecture principles to production war stories on resilience, observability, and data consistency across services.
Monolith — the entire application is a single deployable unit. Simple to develop and test initially but becomes hard to scale and deploy as it grows.
SOA (Service-Oriented Architecture) — decomposes applications into services, but services are typically coarse-grained, share a database, and communicate over heavyweight protocols (SOAP, ESB).
Microservices — fine-grained services, each:
A bounded context (from Domain-Driven Design) is a self-contained domain model with clearly defined boundaries — a "User" inside the Billing context has different attributes than a "User" in the Social context. Each bounded context is a natural microservice boundary.
Why it matters for sizing:
Rule: let the team structure drive the service boundary (Conway's Law — your system mirrors your communication structure). A 2-pizza team owning one bounded context is the right starting point.
Advantages:
Disadvantages:
The Strangler Fig pattern is the safest way to migrate from a monolith to microservices. Named after a fig tree that gradually envelops its host:
The monolith continues running throughout — users see no disruption. You migrate incrementally, validating each extracted service before moving on. This is far safer than a "big bang" rewrite.
The API Gateway is the single entry point for all clients. It should handle cross-cutting concerns so services don't have to duplicate them:
What it should NOT do: business logic. A gateway that knows about order states or user accounts has become a service itself and creates a coupling point.
Tools: Spring Cloud Gateway (Java/reactive), APISIX, Kong, AWS API Gateway.
A BFF is a separate backend service tailored for one specific frontend client (mobile app, web app, third-party API). Instead of one generic API that all clients use:
Each BFF is owned by the frontend team that uses it. This avoids the "API designed by committee" problem where one API tries to serve all clients poorly. Downsides: more services to maintain, potential duplication across BFFs.
A sidecar is a helper container deployed alongside the main service container in the same pod (Kubernetes). It handles infrastructure concerns so the service doesn't have to:
The sidecar shares the same network namespace as the main container — all traffic passes through it. This lets you add cross-cutting capabilities to services without changing their code, and works across polyglot services.
A service mesh is a dedicated infrastructure layer that handles service-to-service communication. It consists of sidecar proxies (data plane) + a control plane (Istio, Linkerd).
Problems it solves — without changing application code:
Trade-offs: significant operational complexity. Istio adds ~2ms per hop. Only justified in large deployments with many services where the duplication of cross-cutting code across services would be worse.
Orchestration — a central coordinator tells each service what to do and when. Like a conductor. The coordinator knows the entire flow.
// Orchestrator calls each service in sequence:
OrderOrchestrator:
1. Call InventoryService.reserve()
2. Call PaymentService.charge()
3. Call ShippingService.schedule()
4. Call NotificationService.confirm()
Choreography — each service reacts to events published by others. No central coordinator. Each service knows its own responsibilities and publishes events for the next step.
OrderPlaced → Inventory listens → InventoryReserved
InventoryReserved → Payment listens → PaymentCharged
PaymentCharged → Shipping listens → ShipmentScheduled
Tradeoffs: Orchestration is easier to reason about (centralised flow) but creates coupling to the orchestrator. Choreography is more decoupled but harder to trace — you need good distributed tracing to follow a request across events.
The 12-factor methodology defines how to build portable, scalable, cloud-native services. Most relevant factors for microservices:
Synchronous (REST/gRPC) — caller waits for the response. Use when:
Asynchronous (Kafka/RabbitMQ) — caller publishes an event and moves on. Use when:
gRPC is a high-performance RPC framework that uses Protocol Buffers (binary) over HTTP/2. Advantages over REST/JSON:
Use gRPC when: internal service-to-service calls where performance matters, streaming scenarios (real-time updates, large data), polyglot microservices needing a typed contract.
Use REST when: public APIs (browser-friendly), simple CRUD, tooling/ecosystem maturity matters more than raw performance.
In dynamic environments, service instances start/stop/move. Hardcoding IPs fails. Service discovery solves this:
Client-side discovery (Eureka + Ribbon): the client queries the registry and load-balances itself.
Server-side discovery (Kubernetes Service DNS / AWS ALB): the client calls a fixed DNS name; the infrastructure routes to a healthy instance.
# Kubernetes: every service gets a stable DNS name
http://order-service:8080/api/orders
# DNS resolves to a ClusterIP that load-balances across all pods
In modern Kubernetes environments, Kubernetes Services replace Eureka — built-in DNS, health-checked endpoints, no extra component to run. Eureka is primarily used in VM-based deployments with Spring Cloud.
In microservices, integration tests across services are slow and fragile. Consumer-driven contracts (CDC) solve this with a faster feedback loop:
// Consumer (using Pact/Spring Cloud Contract):
given("user 1 exists")
.upon_receiving("a request for user 1")
.with(method: GET, path: "/users/1")
.will_respond_with(status: 200,
body: {id: 1, name: "Alice"}); // contract
Benefits: each service validates in isolation; breaking changes are caught before deployment, not in production. Tools: Pact (polyglot), Spring Cloud Contract (Java-native).
In Event-Driven Architecture (EDA), services communicate by publishing and subscribing to events. The publisher doesn't know who consumes its events — it just publishes to a topic.
// OrderService publishes — knows nothing about consumers:
kafka.send("order-events", new OrderPlacedEvent(orderId, items, total));
// InventoryService subscribes independently:
@KafkaListener(topics = "order-events")
void on(OrderPlacedEvent e) { reserve(e.getItems()); }
// EmailService also subscribes independently:
@KafkaListener(topics = "order-events")
void on(OrderPlacedEvent e) { sendConfirmation(e); }
This enables: adding new consumers without touching the publisher, temporal decoupling (consumer can be offline and catch up), and fan-out to many subscribers. Downside: eventual consistency, harder to trace end-to-end flows.
The dual-write problem: when a service must both update its database AND publish an event, these two operations cannot be atomic across different systems. If the DB commit succeeds but Kafka publish fails, data is inconsistent.
Outbox pattern solution:
// In one transaction:
@Transactional
public Order placeOrder(OrderRequest req) {
Order order = orderRepo.save(new Order(req));
outboxRepo.save(new OutboxEvent("order-placed", toJson(order)));
return order; // both committed atomically
}
Debezium tails the DB transaction log (CDC) — zero polling overhead, guaranteed at-least-once delivery to Kafka.
An operation is idempotent if calling it multiple times produces the same result as calling it once. In microservices, retries are common (network failures, circuit breakers) — if operations aren't idempotent, retries cause duplicates.
// Client sends idempotency key in every request:
POST /payments
Idempotency-Key: 550e8400-e29b-41d4-a716-446655440000
// Server: check if this key was already processed:
@PostMapping("/payments")
public PaymentResult pay(@RequestHeader("Idempotency-Key") String key,
@RequestBody PaymentRequest req) {
return idempotencyCache.computeIfAbsent(key,
k -> paymentService.process(req));
}
For Kafka consumers: store processed message offsets or use a seen-message cache keyed on message ID. Natural idempotency: PUT /users/42 with the full state is idempotent; POST /add-5-to-balance is not.
API versioning lets you evolve APIs without breaking existing consumers. Main strategies:
/api/v1/users, /api/v2/users) — most visible, most explicit. Easy to test in browser. Breaks REST's "resource URI should be stable" principle.Accept: application/vnd.api+json;version=2) — clean URLs, harder to test manually./api/users?version=2) — simple, but mixes versioning with business parameters.Best practice: choose URL versioning (most teams do) and run two versions in parallel during transition. Deprecate v1 after all consumers migrate to v2. Never break a published API without a sunset period.
A circuit breaker monitors calls to a downstream service and stops calling it when failures exceed a threshold — preventing cascading failures.
# Resilience4j config:
resilience4j.circuitbreaker.instances.paymentService:
failure-rate-threshold: 50 # open at 50% failure rate
slow-call-rate-threshold: 100 # also open if 100% calls are slow
slow-call-duration-threshold: 2s
wait-duration-in-open-state: 30s
permitted-number-of-calls-in-half-open-state: 5
Named after the watertight compartments in a ship's hull — a breach in one compartment doesn't sink the ship. In microservices, bulkheads isolate thread pools or connection pools per downstream dependency.
// Without bulkhead: slow InventoryService exhausts ALL threads
// → other endpoints (Payments, Users) also start timing out
// With bulkhead: separate thread pool per downstream:
resilience4j.bulkhead.instances.inventoryService:
maxConcurrentCalls: 10 # max 10 threads for inventory calls
maxWaitDuration: 100ms
resilience4j.bulkhead.instances.paymentService:
maxConcurrentCalls: 20
Even if InventoryService is slow and all 10 slots fill up, payment and user calls still run on their own pools. The failure is contained.
Retrying transient failures (network blip, brief service restart) automatically improves resilience. But naive fixed-interval retries can cause a thundering herd — all failed clients retry simultaneously, overwhelming the recovering service.
Exponential backoff: wait time doubles each retry (1s, 2s, 4s, 8s). Jitter: add random noise so clients don't all retry at the same millisecond.
resilience4j.retry.instances.orderService:
max-attempts: 3
wait-duration: 500ms
enable-exponential-backoff: true
exponential-backoff-multiplier: 2 # 500ms, 1000ms, 2000ms
randomized-wait-factor: 0.5 # ±50% jitter
@Retry(name = "orderService")
public Order getOrder(Long id) {
return orderClient.findById(id); // retried on exception
}
Only retry idempotent operations. Never retry non-idempotent operations (POST to create, payment charge) without an idempotency key.
Without timeouts, a slow downstream service ties up threads indefinitely, eventually exhausting the thread pool. Every service call must have a timeout.
Timeout budget: set timeouts based on the end user's expected response time, working backwards through the chain.
# If the API Gateway has a 3s timeout for the user:
# Service A → Service B → Service C chain
# A should timeout B at 2.5s (leaving 500ms for A's own work)
# B should timeout C at 1.5s (leaving 1s for B's work)
# Resilience4j TimeLimiter:
resilience4j.timelimiter.instances.inventoryService:
timeout-duration: 1500ms
cancel-running-future: true
Pass the remaining timeout budget via headers (deadline propagation) so each hop knows how much time is left. This prevents a service from retrying after the client has already timed out and moved on.
A fallback is what a service returns when the primary call fails or the circuit is open. Good fallbacks:
@CircuitBreaker(name = "recommendations",
fallbackMethod = "defaultRecommendations")
public List<Product> getRecommendations(Long userId) {
return recommendationService.getFor(userId);
}
public List<Product> defaultRecommendations(Long userId, Exception ex) {
return cache.getPopularProducts(); // cached bestsellers as fallback
}
Distributed transactions across microservices cannot use traditional ACID (two-phase commit doesn't scale). The Saga pattern achieves consistency through a sequence of local transactions with compensating actions on failure.
Choreography Saga (event-driven):
OrderPlaced
→ Inventory: reserve stock → StockReserved
→ Payment: charge card → PaymentCharged
→ Shipping: schedule → OrderComplete
// On payment failure:
PaymentFailed
→ Inventory: release stock (compensating transaction)
Orchestration Saga: A central coordinator (e.g. using Apache Camel, Temporal, AWS Step Functions) drives each step and triggers compensations on failure.
Key insight: compensating transactions must be idempotent and cannot fail (they are the "undo" mechanism of last resort). Design them to be retriable.
Graceful degradation means a service continues to provide reduced but useful functionality when a dependency is unavailable — instead of failing completely.
Examples:
Design services around what is essential vs optional. Non-essential features should have fallbacks that return empty data or cached results. Essential features need synchronous calls with aggressive retries and circuit breakers.
Every service should expose health endpoints that orchestrators and load balancers can poll:
/actuator/health/liveness) — "is the process alive?" If DOWN, Kubernetes restarts the pod./actuator/health/readiness) — "is the service ready to accept traffic?" If DOWN, Kubernetes removes it from the load balancer. Use this for slow startup, warm-up, or dependency checks.# application.yml
management.endpoint.health.probes.enabled=true
management.health.livenessState.enabled=true
management.health.readinessState.enabled=true
# Kubernetes deployment.yaml
livenessProbe:
httpGet: {path: /actuator/health/liveness, port: 8080}
initialDelaySeconds: 30
readinessProbe:
httpGet: {path: /actuator/health/readiness, port: 8080}
initialDelaySeconds: 10
Sharing a database between services creates tight coupling at the data layer — the opposite of what microservices aim for:
Each service owns its schema and the only way to access another service's data is via its API. This enforces encapsulation at the data level and allows polyglot persistence:
Eventual consistency means that after an update, all replicas/services will converge to the same state — but not immediately. The window between "committed in service A" and "visible in service B" is the inconsistency window.
Design strategies:
CQRS (Command Query Responsibility Segregation) — separate the write model (commands) from the read model (queries). Write model is optimised for consistency; read model for query performance.
// Write side: normalised, ACID, handles commands
orderCommandService.placeOrder(cmd);
// Read side: denormalised, optimised for queries
OrderSummaryView view = orderQueryService.getOrderSummary(orderId);
Event Sourcing — instead of storing current state, store every state change as an immutable event. Current state is derived by replaying events.
// Events stored (append-only):
OrderPlaced → ItemAdded → ItemRemoved → OrderConfirmed → OrderShipped
// Current state = replay of all events
Order order = eventStore.loadEvents(orderId).stream()
.reduce(new Order(), Order::apply);
Benefits: complete audit trail, replay events to rebuild read models, time travel (what was the state at time T?). Complexity: event versioning, eventual consistency between command and query models.
No database joins across service boundaries — you must use application-level joins. Options:
// API composition (synchronous):
Order order = orderService.getOrder(id);
User user = userService.getUser(order.getUserId());
return new OrderDetailView(order, user);
Two-Phase Commit (2PC) coordinates an atomic transaction across multiple systems via a coordinator:
Why not in microservices:
Use Saga pattern instead — compensating transactions replace rollback, no locks, no coordinator SPOF.
Each microservice owns its schema and must apply DB migrations as part of its own deployment. Tools: Flyway and Liquibase.
# Spring Boot auto-runs Flyway migrations on startup
spring.flyway.enabled=true
spring.flyway.locations=classpath:db/migration
# Migration file naming: V1__create_users.sql, V2__add_email_index.sql
Key rules for zero-downtime migrations:
// Cache-aside (lazy loading):
public Product getProduct(Long id) {
Product cached = cache.get(id); // 1. Check cache
if (cached != null) return cached; // 2. Cache hit → return
Product product = db.findById(id); // 3. Cache miss → load DB
cache.put(id, product, 10, MINUTES); // 4. Populate cache
return product;
}
Invalidation strategies:
Cache stampede: when a popular key expires and many requests simultaneously miss the cache and hammer the database. Fix: probabilistic early expiry, mutex locks on cache miss, or background refresh.
Defense in depth approach:
sub claim is an allowed service.Client → API Gateway: POST /login {username, password}
API Gateway → Auth Service: validate credentials
Auth Service → API Gateway: JWT (signed with RS256 private key)
API Gateway → Client: JWT
Client → API Gateway: GET /orders (Authorization: Bearer <JWT>)
API Gateway: validate JWT signature (using public key), extract claims
API Gateway → Order Service: forward request + user claims in header
Order Service: trust the gateway-validated claims (no re-validation)
Each downstream service doesn't need to validate the JWT signature again — the gateway already did. It reads user claims from a trusted header (X-User-Id, X-User-Roles) set by the gateway. Never trust headers set by the client directly.
Secrets (DB passwords, API keys, TLS certs) must never be hardcoded or stored in source control. Options in increasing security order:
Best practice: use Vault or a cloud secrets manager. Rotate secrets regularly. Use short-lived dynamic credentials so a leaked secret expires quickly.
Rate limiting prevents abuse and protects downstream services from overload. Common algorithms:
# Spring Cloud Gateway rate limiter (Redis-backed token bucket):
spring.cloud.gateway.routes:
- id: order-service
filters:
- name: RequestRateLimiter
args:
redis-rate-limiter.replenishRate: 100 # 100 tokens/sec
redis-rate-limiter.burstCapacity: 200 # max burst
key-resolver: "#{@userKeyResolver}" # per-user limit
Each service should have only the minimum permissions needed to do its job:
// Incoming request gets a Trace ID:
GET /checkout [TraceId: abc123, SpanId: span1]
// Gateway calls Order Service:
POST /orders [TraceId: abc123, SpanId: span2, ParentSpan: span1]
// Order Service calls Inventory:
GET /inventory/reserve [TraceId: abc123, SpanId: span3, ParentSpan: span2]
// Inventory calls DB:
SELECT ... [TraceId: abc123, SpanId: span4, ParentSpan: span3]
The TraceId stays the same through the entire request. Each hop creates a new SpanId. W3C TraceContext headers (traceparent) are the modern standard for propagating these IDs. Spring Boot + Micrometer Tracing propagates them automatically across RestTemplate, WebClient, and Kafka.
In Jaeger/Zipkin you can visualise the full call tree and see exactly where latency is spent.
Structured logging means outputting logs as machine-parseable JSON instead of free-text strings:
// Unstructured (hard to query):
2026-06-23 12:01:05 INFO Order 12345 placed by user 42, total $199.99
// Structured JSON (easy to filter and aggregate):
{"timestamp":"2026-06-23T12:01:05Z","level":"INFO","service":"order-service",
"traceId":"abc123","spanId":"span2","orderId":12345,"userId":42,
"total":199.99,"event":"order.placed"}
With structured logs, Kibana/Grafana Loki can filter service="order-service" AND event="order.placed" AND total > 100 without regex parsing. The traceId field lets you jump from a trace in Jaeger directly to the logs for that request.
# Logback JSON config (logstash-logback-encoder):
<encoder class="net.logstash.logback.encoder.LogstashEncoder"/>
Error budget: if SLO is 99.9% availability, your error budget is 0.1% downtime per month (~43 minutes). Spend it on risky deployments. When the budget is exhausted, freeze deployments and focus on reliability.
Track SLIs with Prometheus alerts. Alert on SLO burn rate, not individual incidents.
A canary deployment routes a small percentage of traffic to the new version before rolling out to everyone. Named after the "canary in a coal mine" — if it dies, you know there's a problem:
# Kubernetes with Argo Rollouts:
strategy:
canary:
steps:
- setWeight: 5 # 5% traffic to new version
- pause: {duration: 5m}
- analysis: {templates: [{name: error-rate-check}]}
- setWeight: 50 # 50% if analysis passed
- pause: {duration: 10m}
- setWeight: 100 # full rollout
Monitor error rate and latency during each step. If metrics degrade beyond a threshold, automatically rollback. This lets you validate the new version in production with minimal blast radius.
Blue-green deployment maintains two identical production environments. "Blue" is live. "Green" has the new version deployed and tested:
Advantages: instant rollback (just switch LB back), zero-downtime. Disadvantages: double infrastructure cost, database schema changes must be backward-compatible with both versions.
When Service A calls Service B, and you're deploying a breaking API change in B, deployment order matters.
Safe deployment order for breaking changes:
This is the expand-contract pattern applied to APIs. Always deploy the provider change first (backward compatible), then consumers, then remove old behavior.
Consumer-driven contract tests (Pact) catch this before deployment — the provider's CI pipeline verifies it still satisfies all consumer contracts.
Microservices are designed for horizontal scaling — each service scales independently based on its own load. Kubernetes HPA (Horizontal Pod Autoscaler) scales pods automatically based on CPU/memory or custom metrics (e.g. Kafka consumer lag).
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
scaleTargetRef: {name: order-service}
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource: {name: cpu, target: {type: Utilization, averageUtilization: 70}}
Chaos engineering is the practice of deliberately injecting failures into a system in production to validate that it can withstand real-world disruptions. Netflix coined the term with Chaos Monkey (randomly terminates EC2 instances).
Experiments:
# Istio fault injection:
httpFault:
delay:
percentage: {value: 50}
fixedDelay: 3s # 50% of requests to inventory get 3s delay
Run chaos experiments in staging first, then production during business hours with an on-call engineer present. The goal: find weaknesses before users do.
The microservice testing pyramid:
This is the "real world incident" senior question. A structured runbook:
kubectl get pods -n production | grep order. Are pods CrashLoopBackOff? OOMKilled? Pending?kubectl logs <pod> --previous. Is there a startup exception? DB connection failure? OutOfMemoryError?kubectl get pods -n databases. Is Redis up? Is a downstream service the actual root cause?