Learn how FinServ eng leaders optimize costs with AI for prod

How to debug Kubernetes Pod Pending State?

A technical guide to debugging Pod Pending states. Explore the impact of zone-locked PVs, PriorityClasses, and the latency differences between Cluster Autoscaler and Karpenter.

What is the Pod Pending State?

A pod remains in the "Pending" state when the Kubernetes Scheduler (kube-scheduler) cannot find a feasible node that satisfies all the pod's constraints. Unlike a "Waiting" state (often an image pull issue), "Pending" indicates a scheduling failure. The authoritative record of why a pod is pending is found in the Events section of kubectl describe pod <name>.

How to debug Pod stuck in pending state?

A pod enters the Pending state when Kubernetes accepts the pod specification but cannot schedule it onto a node. The scheduler evaluates every node against the pod's requirements like resource requests, node selectors, tolerations, affinity rules, volume claims, and priority. A node must satisfy all constraints. If none do, the pod remains in Pending state:

kubectl describe pod <pod-name> surfaces the reason in the Events section:

  • 0/3 nodes are available: insufficient cpu or insufficient memory
  • 0/3 nodes are available: 1 node(s) had taint {key: value}, that the pod didn't tolerate
  • pod has unbound immediate PersistentVolumeClaims
  • 0/3 nodes are available: 1 node(s) didn't match Pod's node affinity/selector
  • 0/3 nodes are available: 3 Insufficient cpu. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod

Each message points to a different constraint failure category.

Resource constraints

Kubernetes schedules based on requests, not actual usage. A pod requesting 4Gi of memory won't schedule if every node has less than 4Gi allocatable or already committed, regardless of actual memory consumption on those nodes.

Check allocatable versus requested resources:

kubectl describe nodes | grep -A 5 "Allocated resources"

The gap between "Allocated resources" and "Allocatable" determines what's available for new pods. If your pod's requests exceed this gap on every node, it stays Pending.

Namespaces with ResourceQuotas add another constraint layer. A namespace might have 32Gi memory quota with 30Gi already allocated across existing pods. A new pod requesting 4Gi fails quota admission before the scheduler even evaluates node capacity

kubectl describe resourcequota -n <namespace>

LimitRanges enforce per-pod or per-container constraints. A pod without explicit requests inherits default values from the LimitRange. These defaults might exceed available node capacity even if the workload's actual needs are minimal.

Pod priority and preemption

When cluster capacity is exhausted, scheduling outcome depends on PriorityClasses. Higher-priority pods can preempt (evict) lower-priority pods to free resources.

A pod stays Pending when:

  1. No node has sufficient resources, AND
  2. The pod's priority isn't high enough to preempt existing workloads, OR
  3. Preemption is disabled for the pod (preemptionPolicy: Never), OR
  4. No combination of evictions would free sufficient resources without violating PodDisruptionBudgets

The scheduler evaluates preemption candidates by simulating evictions. If evicting lower-priority pods on a node would free enough resources AND wouldn't violate PDBs, those pods become preemption victims. If no viable preemption path exists, the higher-priority pod remains Pending.

Check your pod's priority:

kubectl get pod <pod-name> -o jsonpath='{.spec.priorityClassName}'
kubectl get priorityclass
```

A pod with no PriorityClass gets the cluster's default priority (often 0). System-critical pods (`system-cluster-critical`, `system-node-critical`) have priorities in the billions. A priority-0 pod won't preempt anything; it waits for resources to become available through natural pod termination or node scaling.

The describe output shows preemption evaluation:
```
0/3 nodes are available: 3 Insufficient memory. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod.
```

"No preemption victims found" means either all running pods have equal or higher priority, or evicting lower-priority pods wouldn't free enough resources.

## Taints and Tolerations

Taints mark nodes as unsuitable for pods lacking matching tolerations. A node tainted with `dedicated=ml-workloads:NoSchedule` only accepts pods that tolerate that specific taint.

Scenarios that cause Pending states:

Dedicated node pools where your pod lacks the required toleration. Control plane nodes tainted with `node-role.kubernetes.io/control-plane:NoSchedule` by default. Nodes under maintenance receiving `node.kubernetes.io/unschedulable` taints during drain operations. Cloud provider taints like `node.kubernetes.io/not-ready` during node initialization.

The describe output identifies blocking taints:
```
0/3 nodes are available: 1 node(s) had taint {dedicated: ml-workloads}, that the pod didn't tolerate, 2 node(s) had taint {node.kubernetes.io/unschedulable: }, that the pod didn't tolerate.

List node taints and pod tolerations:

kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
kubectl get pod <pod-name> -o jsonpath='{.spec.tolerations}'

Node selectors and affinity rules

Pod-side constraints that restrict scheduling choices.

  • A pod with nodeSelector: disktype: ssd only schedules onto nodes labeled disktype=ssd. If no nodes have that label, or all labeled nodes lack capacity, the pod remains Pending.
  • Node affinity provides richer semantics. requiredDuringSchedulingIgnoredDuringExecution rules must be satisfied. They're hard constraints. preferredDuringSchedulingIgnoredDuringExecution rules influence scoring but don't block scheduling.
  • Pod anti-affinity rules that require spreading across failure domains can leave a pod Pending. If you require one replica per zone (topologyKey: topology.kubernetes.io/zone with requiredDuringScheduling) and all zones already have a replica, additional replicas can't schedule.

Check selector and affinity configuration:

kubectl get nodes --show-labels
kubectl get pod <pod-name> -o jsonpath='{.spec.nodeSelector}'
kubectl get pod <pod-name> -o jsonpath='{.spec.affinity}' | jq

Persistent volume claim binding failures

pod has unbound immediate PersistentVolumeClaims indicates the pod references a PVC without a bound PersistentVolume.

Causes include:

The PVC's StorageClass doesn't exist or specifies a non-existent provisioner. The storage provisioner lacks permissions to create volumes (IAM roles, service account bindings). Cloud provider volume quotas are exhausted. The PVC's access mode (ReadWriteOnce, ReadWriteMany) or capacity doesn't match any available PV. Zone constraints prevent binding: EBS volumes are zone-specific, and a PVC requesting storage in us-east-1a can't bind to a PV in us-east-1b.

Zone-locked volumes create secondary scheduling constraints. Once a PVC binds to a volume in a specific zone, the pod must schedule to a node in that zone. If that zone's nodes lack capacity, the pod stays Pending despite available resources in other zones.

Trace the binding chain:

kubectl get pvc <pvc-name>
kubectl describe pvc <pvc-name>
kubectl get storageclass
kubectl describe storageclass <class-name>

Pod evictions and node consolidation

Karpenter, Cluster Autoscaler, and similar tools consolidate workloads by draining underutilized nodes. Pods evicted during consolidation must reschedule onto remaining nodes.

Consolidation-induced Pending states occur when:

Evicted pods have resource requests that remaining nodes can't accommodate. The sum of requests fit on the original node but exceed headroom on denser consolidated nodes.

Node selectors or affinity rules exclude newly provisioned nodes. Karpenter provisions based on pending pod requirements, but if those requirements include selectors for labels only present on the terminated node, new nodes won't match.

Taints on new nodes don't match pod tolerations. Node pools might apply different taints than the original nodes.

PodDisruptionBudgets block evictions but don't prevent consolidation from being attempted. Repeated consolidation attempts against PDB-protected workloads can cause scheduling churn without successful pod movement.

Karpenter's consolidation decision evaluates whether pending pods could schedule after consolidation. Mismatches between this prediction and actual scheduling (due to race conditions with other pending pods, or constraints not fully evaluated) leave pods stranded.

Debugging sequence to resolve the issue

  1. Get the scheduler's reason:
kubectl describe pod <pod-name> | grep -A 20 Events
  1. For resource failures, compare requests to capacity:
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[*].resources.requests}'
kubectl describe nodes | grep -A 10 "Allocated resources"
kubectl describe resourcequota -n <namespace>
  1. For preemption failures, check priority:
kubectl get pod <pod-name> -o jsonpath='{.spec.priorityClassName}'
kubectl get pods -n <namespace> -o custom-columns=NAME:.metadata.name,PRIORITY:.spec.priority
  1. For taint failures:
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
kubectl get pod <pod-name> -o jsonpath='{.spec.tolerations}'
  1. For affinity failures:
kubectl get nodes --show-labels | grep <expected-label>
kubectl get pod <pod-name> -o jsonpath='{.spec.affinity}' | jq
  1. For PVC failures:
kubectl get pvc -n <namespace>
kubectl describe pvc <pvc-name>
kubectl get events -n <namespace> --field-selector involvedObject.name=<pvc-name>

Resolution patterns

  • Resource constraints: Reduce pod requests to actual requirements. Increase node count or size. Enable cluster autoscaling. Adjust namespace ResourceQuotas.
  • Priority/preemption: Assign appropriate PriorityClasses to workloads. Production workloads should have higher priority than batch jobs. Verify preemption isn't disabled on pods that need it.
  • Taints: Add required tolerations to pod specs. Remove obsolete taints from nodes. For maintenance taints, wait for drain completion or expedite the maintenance window.
  • Node selectors/affinity: Verify target nodes have required labels. Convert hard requirements to preferences where strict placement isn't necessary. For zone-spreading anti-affinity, ensure sufficient zones exist for replica count.
  • PVC binding: Fix storage provisioner permissions. Verify StorageClass configuration. For zone-locked volumes, ensure node capacity exists in the volume's zone. Consider using WaitForFirstConsumer volume binding mode to defer zone selection until pod scheduling.
  • Consolidation conflicts: Increase Karpenter's consolidation threshold to maintain more headroom. Add appropriate node selectors to Karpenter provisioners. Configure PDBs to rate-limit evictions. Review whether pod resource requests accurately reflect actual usage.

How Resolve AI approaches scheduling failures

Scheduling failures sit at the intersection of multiple configuration domains: pod specs, node configurations, cluster policies, storage provisioners, and autoscaler behavior. The scheduler reports which constraint failed but not why that constraint exists or what changed to cause the failure.

Resolve AI investigates Pending pods by correlating the pod's requirements against cluster state and recent changes. A preemption failure might trace to a recent PriorityClass modification. A resource constraint might connect to a deployment scaling event that consumed previously available capacity. A taint mismatch might stem from a Karpenter provisioner change that altered which taints new nodes receive.

For clusters running Karpenter or similar consolidation tools, Resolve AI identifies patterns where autoscaler behavior and pod constraints conflict: workloads whose resource requests prevent consolidation from finding viable targets, node selectors that exclude provisioner-managed nodes, or PDBs that block consolidation without operators realizing the interaction. Surfacing these configuration tensions before they cause extended Pending states moves the work from incident response to configuration validation.