Back to Blog

Kubernetes: Keeping Containers Alive at 3 AM

You deploy your app in a container. It crashes at 3 AM. No one is awake to restart it.

Kubernetes solves this. It watches your containers, restarts them when they crash, routes traffic to healthy ones, and scales them up when traffic spikes. You write YAML files describing what you want. Kubernetes makes it happen.

Why Orchestration?

You can run containers manually. Docker works fine. Start a container, map ports, done.

But production is messy. Containers crash. Servers die. Traffic spikes. You need 5 copies of your API but only 2 of your database. You want zero-downtime deployments. Doing this manually means writing bash scripts, setting up monitoring, and hoping nothing breaks while you sleep.

The Problem

How do you keep containers running when they crash at 3 AM and no one is awake?

Manual vs Orchestrated See how Kubernetes handles failures automatically
Container 1
Status: running
Requests: 0
Container 2
Status: running
Requests: 0
Container 3
Status: running
Requests: 0
Total: 0 requests
Try It Out

Try crashing containers in manual mode. You have to restart them yourself. Switch to Kubernetes mode and crash containers. They restart automatically. The load balancer routes traffic around failures.

Key Insight

Orchestration means declaring what you want and letting the system figure out how to maintain it. You say "I want 3 replicas." Kubernetes keeps 3 replicas running, even when nodes die.

Pods: The Smallest Unit

Kubernetes doesn't manage containers directly. It manages pods.

A pod is one or more containers that run together on the same machine, sharing network and storage. Usually one container per pod, but sometimes you need sidecars: a logging agent, a proxy, something that helps the main container.

Containers in a pod can talk to each other on localhost. They share volumes. They start and stop together. If you need two containers that are tightly coupled, put them in the same pod.

Pod Lifecycle Create and manage pod states
No pods running. Click "Create Pod" to start.
Try It Out

Click "Create Pod" and watch it go through phases: Pending (waiting for a node), Running (containers started), or Failed (something went wrong). Click "Delete Pod" and see how Kubernetes terminates gracefully before removing it.

Key Insight

Pods are ephemeral. They have IP addresses, but those IPs change when pods restart. You can't rely on pod IPs. That's where Services come in.

Services: Finding Things in the Cluster

The Problem

Your frontend needs to talk to your backend. Your backend runs in 3 pods that restart, scale, and move around. How does the frontend find them?

A Service is a stable endpoint that routes traffic to pods. It has a fixed IP and DNS name. Behind that IP, Kubernetes load balances across all matching pods.

Service Discovery Services route to ready pods automatically
Service: backend-service
Selector: app=backend
Endpoints: 2 ready pods
pod-1
Ready: Yes
Requests: 0
pod-2
Ready: Yes
Requests: 0
Total: 0 requests
Try It Out

Add and remove backend pods. The Service automatically updates its list of targets. Send requests and watch them distribute across available pods.

Services use selectors to find pods. You label your pods app=backend, and the Service selects all pods with that label. When new pods appear with the label, the Service includes them. When pods die, the Service removes them.

There are different Service types:

  • ClusterIP: Internal only. Other pods can reach it, external traffic can't.
  • NodePort: Exposes the Service on a port on every node. External traffic can hit any node's IP on that port.
  • LoadBalancer: Creates an external load balancer (in cloud environments) with a public IP.

Most services are ClusterIP. You use LoadBalancer for the entry point to your cluster.

Deployments: Rolling Updates Without Downtime

The Problem

You have a new version of your app. You need to deploy it to 10 pods without downtime. The naive approach: delete all old pods, start new ones. Your app is down for 30 seconds while new pods boot. How can we do better?

Kubernetes uses rolling updates. It brings up new pods gradually and removes old pods only when new ones are healthy.

Rolling Update Zero-downtime deployments with gradual rollout
pod-1
v1
✓ Ready
pod-2
v1
✓ Ready
pod-3
v1
✓ Ready
pod-4
v1
✓ Ready
v1 pods: 4
v2 pods: 0
Ready: 4
Try It Out

Click "Deploy v2" and watch the update. Kubernetes starts a few new pods, waits for them to pass health checks, then terminates a few old pods. It repeats until all pods are updated. Traffic keeps flowing to whichever pods are ready.

You control the speed with two settings:

  • maxSurge: How many extra pods to create during the update (e.g., 25% = if you want 10 pods, create up to 12 during rollout)
  • maxUnavailable: How many pods can be down during the update (e.g., 25% = at most 2 of 10 can be unavailable)

If the new version fails health checks, the rollout stops. The bad pods never receive traffic. Old pods keep serving. You fix the bug and try again.

What if I deploy a bad version and don't notice until later?

Rollbacks. Kubernetes keeps the previous ReplicaSet around. You run kubectl rollout undo and it reverses the deployment. The old version comes back using the same rolling update process.

Scaling: Reacting to Load

The Problem

Traffic doubles. Your 3 pods are maxed out. You need 6. Manual scaling is easy: kubectl scale deployment/myapp --replicas=6. But you don't want to watch dashboards at 2 AM and run commands manually. How can this be automated?

Horizontal Pod Autoscaler (HPA) watches metrics and adjusts replicas automatically.

Horizontal Pod Autoscaler Scale replicas based on CPU usage
Current CPU
15%
Replicas
2
pod-1
pod-2
CPU below target, scaling down...
Try It Out

Drag the traffic slider and watch the autoscaler react. CPU goes up, more pods are created. Traffic drops, excess pods are removed. The autoscaler waits a bit before scaling down to avoid flapping.

You set a target metric: "Keep CPU at 70%." The autoscaler does math. If current CPU is 90%, it scales up. If current CPU is 40%, it scales down.

The formula is roughly: desiredReplicas = currentReplicas * (currentMetric / targetMetric). If you have 3 pods at 90% CPU and you want 70%, you get 3 * (90/70) ≈ 4 pods.

Autoscaling works with custom metrics too: requests per second, queue depth, whatever you expose via Prometheus or the metrics API.

Self-Healing: The Cluster Fixes Itself

Kubernetes constantly watches for problems and fixes them.

Pod crashed? Start a new one. Node died? Move the pods to another node. Container is stuck but not technically crashed? Kill it and restart.

Self-Healing Watch Kubernetes recover from failures
Node 1
healthy
pod-1
pod-2
Node 2
healthy
pod-3
Node 3
healthy
pod-4
Try It Out

Try crashing containers and nodes. Watch Kubernetes detect failures and recover. Kill a pod, a new one appears. Kill a node, its pods move elsewhere.

Kubernetes uses three types of health checks:

  • Liveness probes: Is the container alive? If not, kill and restart it.
  • Readiness probes: Is the container ready to serve traffic? If not, remove it from the load balancer but don't kill it.
  • Startup probes: Give slow-starting containers more time before liveness checks begin.

A typical liveness probe: HTTP GET to /healthz every 10 seconds. If it fails 3 times in a row, kill the container. The ReplicaSet notices the pod is missing and creates a replacement.

Readiness is different. A container might be alive but temporarily unable to serve traffic: it's warming up caches, waiting for a database, whatever. Readiness probes remove the pod from Service endpoints but leave it running.

Scheduling: Deciding Where Pods Run

The Problem

You have 10 nodes. A new pod needs to run. Which node should get it?

The scheduler looks at constraints:

  • Does the node have enough CPU and memory?
  • Does the pod require specific hardware (GPU, SSD)?
  • Are there affinity rules (run near other pods or away from them)?
  • Are there taints (nodes rejecting certain pods)?
Scheduler Watch pods get assigned to nodes based on resources
Node 1
CPU: 0/4 | Memory: 0/8 GB
Node 2
CPU: 0/2 | Memory: 0/4 GB
Node 3
CPU: 0/8 | Memory: 0/16 GB
Try It Out

Add pods with different resource requirements. Watch the scheduler place them. Try adding a pod that's too big for any single node. It stays pending.

Node affinity pulls pods toward certain nodes. You can say "prefer nodes with SSDs" or "require nodes in us-east-1." Pod affinity controls relationships between pods: "run my cache pod on the same node as my app pod" or "spread database replicas across nodes so one node failure doesn't kill them all."

Taints and tolerations work the opposite way. A node says "I have taint=gpu:NoSchedule." Only pods that tolerate that taint can run there. This keeps GPU nodes reserved for GPU workloads.

If a pod can't be scheduled (not enough resources, no matching nodes), it stays Pending. You see this in kubectl get pods and have to add capacity or adjust the pod's requirements.

ConfigMaps and Secrets: Configuration Without Rebuilding

The Problem

Your app needs configuration: database URLs, feature flags, API keys. Hardcoding them in the image is bad. Environment variables work but are scattered across deployment YAML. How do we manage config properly?

ConfigMaps store configuration as key-value pairs. Secrets store sensitive data (passwords, tokens) with base64 encoding and access controls.

ConfigMap Manage application configuration
ConfigMap: app-config
Pod: app-pod-1
API_URL=https://api.example.com
LOG_LEVEL=info
MAX_RETRIES=3
Try It Out

Change the ConfigMap and see how it affects the pod. With some setups, you can update config without restarting pods. With others, you need a rolling restart to pick up changes.

You can mount ConfigMaps as files or inject them as environment variables.

envFrom:
  - configMapRef:
      name: app-config

Or mount as a volume:

volumes:
  - name: config
    configMap:
      name: app-config
volumeMounts:
  - name: config
    mountPath: /etc/config

Secrets work the same way but with restricted RBAC. Not everyone can read secrets. In managed Kubernetes, secrets are often encrypted at rest and integrated with cloud KMS.

Warning

Are base64-encoded secrets actually secure?

No. Base64 is encoding, not encryption. Anyone with API access to your cluster can decode secrets. The security comes from RBAC (restricting who can read them) and encryption at rest. For better security, use external secret managers like Vault, AWS Secrets Manager, or Google Secret Manager with Kubernetes operators that sync secrets in.

In Practice

Real clusters combine these pieces.

You write a Deployment that specifies pod template, replica count, and update strategy. The Deployment creates a ReplicaSet that ensures the right number of pods exist. A Service provides a stable endpoint. An HPA watches metrics and scales replicas. ConfigMaps hold non-sensitive config, Secrets hold credentials.

When you deploy:

  1. Kubernetes schedules pods onto nodes with enough resources
  2. Kubelet on each node starts containers
  3. Liveness and readiness probes begin
  4. Service endpoints update when pods become ready
  5. If pods crash, they restart
  6. If nodes die, pods move
  7. If traffic spikes, HPA scales up

You don't write bash scripts to do this. You write YAML describing the desired state. The control plane watches actual state and reconciles differences.

Controllers and Reconciliation

The pattern behind all of this is the reconciliation loop.

Every Kubernetes controller watches resources and ensures actual state matches desired state. The Deployment controller watches Deployments and creates ReplicaSets. The ReplicaSet controller watches ReplicaSets and creates Pods. The scheduler watches Pods and assigns them to nodes.

Each controller runs a loop:

  1. Watch for changes
  2. Compare desired state to actual state
  3. Take action to reconcile (create, update, delete resources)
  4. Repeat
Key Insight

This is declarative rather than imperative. You don't run commands to create 3 pods. You declare "I want 3 pods" in a Deployment. The controller sees 0 pods exist, creates 3. If one crashes, the controller sees 2 exist, creates 1 more.

If you manually delete a pod, the controller doesn't care why it's gone. It just sees "I want 3, I have 2" and creates another. This is why killing pods doesn't permanently remove them. You have to delete the Deployment or scale it to 0.

Summary

Kubernetes manages containers at scale. Instead of manually starting, stopping, and monitoring containers, you describe what you want and Kubernetes maintains it.

Key Insight

The magic is reconciliation. Controllers constantly watch actual state and fix drift from desired state. You declare what you want. Kubernetes makes it happen and keeps it happening, even when things break.

Here's what you've learned:

  • Pods are the unit of deployment, usually one container per pod
  • Services provide stable networking to find pods
  • Deployments manage rolling updates and rollbacks
  • HPA scales replicas based on metrics
  • Self-healing restarts crashed containers and reschedules pods from dead nodes
  • The scheduler places pods on nodes based on resources
  • ConfigMaps and Secrets separate configuration from code

When someone says "we use Kubernetes," they mean: containers wrapped in Pods, managed by Deployments, exposed via Services, scaled by HPA, scheduled across nodes, with configs in ConfigMaps and credentials in Secrets. All of it watched by controllers that fix problems automatically.

Learning Kubernetes means learning the primitives and how they compose. Pods, Services, Deployments, and ConfigMaps cover 80% of what you need. The rest is solving specific problems: persistent storage with PersistentVolumes, batch jobs with Jobs and CronJobs, cluster networking with Ingress, access control with RBAC.

Try It Out

Start with a Deployment and a Service. Deploy something. Break it. Watch it recover. That's the core loop that makes Kubernetes useful.