Kubernetes Node Not Ready: Troubleshoot Common Causes & Best Practices

What is Kubernetes Node Not Ready Error?

The "Node NotReady" error indicates that a node is currently unavailable or not in the ready state to run the workloads or pods. When a node is in the "NotReady" state, the Kubernetes control plane stops scheduling the new pods onto that node and reschedules the existing pods to other healthy nodes to keep the desired state of the cluster. The node controller continuously monitors the health of nodes; if a node fails to report back within a specific grace period, it is marked as NotReady.

Nodes that show NotReady status are in the NotReady state:

NAME             STATUS        AGE     VERSION
node-1           Ready         18d     v1.29.1
node-2           NotReady      18d     v1.29.1

Understanding the Kubernetes Node States:

Kubernetes nodes can be in various states, indicating their status and health. It can be one of the following:

1. Ready: The node is healthy and ready to run pods. It means that the node is functioning properly and can host workloads.

2. NotReady: The node is not healthy and cannot run pods. It means the node is experiencing issues. Pods scheduled on this node may be evicted or rescheduled to other nodes.

3. SchedulingDisabled: When the node is in the SchedulingDisabled state, the node is marked as unschedulable, meaning no new pods will be scheduled on it.

4. Unknown: If the node controller is unable to communicate with the node in the last node-monitor-grace-period (default 40s) time, the node status is shown as unknown.

Causes of Kubernetes Node NotReady Error:

Various issues can cause a Kubernetes node to show a NotReady status. Let's discuss:

1. Lack of Resources: Nodes in a Kubernetes cluster require sufficient CPU, memory, and disk resources to function properly. When these resources are exhausted, the node may become unresponsive or unable to manage its workloads effectively, leading to a NotReady status. High CPU or memory usage can cause pods to be evicted or fail to start. Disk pressure occurs when disk usage exceeds a certain threshold, leading to the node being marked as NotReady.

2. Issues with the Kubelet: The Kubelet is the agent running on each node, responsible for managing pods and containers. If the Kubelet crashes or is misconfigured, it can't communicate with the API Server, and the node may become NotReady. The node status shows conditions like KubeletNotReady.

3. Issues with Kube-Proxy: Kube-Proxy is responsible for maintaining network rules on nodes. If kube-proxy crashes or is misconfigured, it can disrupt network traffic, leading to the node being marked as NotReady.

4. Connectivity Issues: Network connectivity is important for nodes to communicate with the control plane and other nodes in the cluster. Network misconfigurations can disrupt this communication, causing nodes to fail to report their status and leading to a NotReady state.

Note: During the initial phase of a node joining the cluster, it may temporarily display a NotReady status. This is a normal part of the process and should not be a cause for concern unless the node remains in this state for an extended period.

Diagnose of Kubernetes Node NotReady Error:

1. Check the Node Status: To ensure the error you are facing is due to an unhealthy node, the status is shown as NotReady in the Kubectl get nodes command. This command provides the current status of the nodes. If a node is marked as NotReady, it indicates that the node is not functioning correctly and cannot schedule new pods.

Kubectl get nodes
NAME            STATUS        AGE     VERSION
node-1           Ready         18d     v1.29.1
node-2           NotReady      12d     v1.29.1

2. Check the Node's Details and Conditions: To gain more information about the node, you can use the kubectl describe node <name> command. This command provides detailed information about the node, including its conditions and events. By examining the conditions section, you can identify specific issues such as memory pressure, disk pressure, or network problems that might be causing the node to be NotReady.

kubectl describe node node-2

Conditions:
Type            Status     Reason                        Message
----            ------     ------                        —-----                        
MemoryPressure   True      KubeletHasInsufficientMemory  kubelet
has insufficient memory     available  
DiskPressure     False     KubeletHasNoDiskPressure      kubelet
has no disk pressure  
PIDPressure      False     KubeletHasSufficientPID       
kubelet has sufficient PID available
Ready            False     KubeletNotReady                      
Node is under memory pressure

MemoryPressure - specify whether a node is running low on available memory.

DiskPressure - signifies whether a node is running out of disk space.

PIDPressure - refers to whether a node is running too many processes.

If any of these conditions are True, it typically means the node is under resource pressure and may not be able to handle workloads effectively. As a result, the node is marked as NotReady, indicating it cannot run the pods. To know more about the node conditions, refer this.

3. Kubelet Issue: If all conditions are unknown, it means the kubelet on the node is down, causing the node to go into a NotReady state.

Conditions:
Type            Status     Reason
----            -—---      ------                                         
MemoryPressure  Unknown    NodeStatusUnknown       
DiskPressure    Unknown    NodeStatusUnknown       
PIDPressure     Unknown    NodeStatusUnknown      
Ready           Unknown    NodeStatusUnknown

The kubelet is the sole point of contact for the Kubernetes cluster, managing the lifecycle of containers on the node, and if it is not running, the node will not be able to report its status correctly. To diagnose this, you can check the Kubelet logs on the node for any errors or issues.

4. Check Kubernetes System Pods: To diagnose, check the status of the Kubernetes system pods using the kubectl get pods -n kube-system command. These pods are critical for the operation of the cluster, and if any of them are not running correctly, it can affect the status of the nodes.

kubectl get pods -n kube-system 
NAME                          READY   STATUS            RESTARTS
coredns-558bd4d5db-7x8k4       1/1     Running            0
coredns-558bd4d5db-8x9k5       1/1     Running            0
etcd-master                    1/1     Running            0
kube-apiserver-master          1/1     Running            0
kube-controller-manager-master 1/1     Running            0
kube-proxy-4x8k4               0/1     CrashLoopBackOff   5
kube-scheduler-master          1/1     Running            0

To see only the pods assigned to a specific node, you can use:

kubectl get pods -n namespace-name --field-selector spec.nodeName=node-name

5. Checks for connectivity: To diagnose connectivity problems, you can use the

kubectl get node ${NODE_NAME} -o
jsonpath="{.status.conditions[*]}" | jq -c '.[] | .type + " is " + 
.status'

command and look for the NetworkUnavailable flag in the conditions section. If this flag is True, it means the node has a connectivity issue.

Conditions:
Type                 Status
----                 —----         
NetworkUnavailable    True

Fixing Node NotReady Issues:

1. Resolve Lack of Resources: You can increase the resources available to the node or reduce resource consumption by scaling down the workloads or optimizing the resource requests and limits for your pods. You can use commands like top or htop on the node to monitor resource usage. Identifying and shutting down non-Kubernetes processes, running malware scans, upgrading the node, and checking for hardware issues or misconfigurations can also help conserve resources and resolve the issue.

2. Resolve Kubelet Issues: To resolve kubelet issues, SSH into the node and run systemctl status kubelet. The status can be: active (running), active (exited), or inactive (dead).

active (running): The kubelet is operational, and the issue might be elsewhere.

active (exited): The kubelet exited, maybe due to an error. Restart it using sudo systemctl restart kubelet.

inactive (dead): The kubelet crashed. Use journalctl -u kubelet to examine the logs and identify the cause.

If the kubelet service is running and has the necessary permissions but the node is NotReady, it also makes sense to look in the kubelet logs—it may be erroring but not crashing.

3. Resolve Kube-Proxy Issues: Check the logs of the kube-proxy pod to identify any errors or warnings. Make sure the kube-proxy as DaemonSet is configured correctly. If you identify any issues with the kube-proxy pod, you can delete it to force a restart. The DaemonSet controller will automatically create a new pod.

4. Checking Connectivity: Check the network configuration on the node, ensure that the necessary ports are open, and verify that the network plugins are correctly installed and configured. You can use commands like ping or traceroute to test network connectivity from the node to other nodes or external endpoints.

ping node-ip 
traceroute node-ip

‍
We have discussed in detail the various causes of the Kubernetes Node NotReady state and their solutions. Whether it is a lack of resources, problems with Kubelet or Kube-Proxy, or network connectivity issues, by following the above steps, you can identify this issue, resolve it effectively, and ensure a more stable and efficient Kubernetes environment.

Fix Kubernetes Node NotReady Errors 10x Faster with PerfectScale by DoiT

PerfectScale Kubernetes governance platform continuously monitors workload behavior and detects signs of instability, such as OOM events or CPU throttling, that often lead to crash loops.

Identify up to 30 Different Resilience Rypes of Risks

‍
PerfectScale’s tracks up to 30 K8S alerts are tailored specifically for Kubernetes. They cover a wide range of potential issues, such as pod failures or resource exhaustion, eliminating the need for deep Kubernetes expertise.

Real-time Alerts without Alerts Fatique

PerfectScale allows users to easily set up alerts for their clusters and manage them efficiently with Alert Profiles. You can easily monitor and get notified about alerts that are relevant for your setup.

For faster updates, utilize Slack or MS Teams Integration Profiles to receive notifications when an Alert is generated.

Get Actionable Recommendations to Eliminate Kubernetes Node NotReady Error

By leveraging real usage data and resilience-aware policies, PerfectScale delivers precise recommendations for right-sizing workloads s at the container level, ensuring workloads have exactly what they need. Whether applied manually or autonomously for immediate impact, these recommendations restore workload stability, eliminate recurring restarts, and help prevent similar failures in the future.

To apply the recommendations, you can effortlessly copy the .yaml and deploy it to your cluster.

Troubleshoot Kubernetes Node Not Ready with PErfectScale

‍

Join industry leaders like Paramount Pictures and Creditas who have already optimized their Kubernetes environments with PerfectScale. Sign up or Book a demo with our technical experts now!

‍

Kubernetes Node Not Ready: Troubleshoot & Best Practices

Understanding the Kubernetes Node States:

Causes of Kubernetes Node NotReady Error:

Diagnose of Kubernetes Node NotReady Error:

Fixing Node NotReady Issues:

Fix Kubernetes Node NotReady Errors 10x Faster with PerfectScale by DoiT

Identify up to 30 Different Resilience Rypes of Risks

Real-time Alerts without Alerts Fatique

Get Actionable Recommendations to Eliminate Kubernetes Node NotReady Error

Reduce your cloud bill and improve application performance today

Latest Articles

GPU Optimization with Exceptional PerfectScale Visibility

On Demand Webinar: Manage & Scale GenAI on Kubernetes

GCP Cloud Billing with PerfectScale

About the author

Kubernetes Node Not Ready: Troubleshoot & Best Practices

Understanding the Kubernetes Node States:

Causes of Kubernetes Node NotReady Error:

Diagnose of Kubernetes Node NotReady Error:

Fixing Node NotReady Issues:

Fix Kubernetes Node NotReady Errors 10x Faster with PerfectScale by DoiT

Identify up to 30 Different Resilience Rypes of Risks

Real-time Alerts without Alerts Fatique

Get Actionable Recommendations to Eliminate Kubernetes Node NotReady Error

Reduce your cloud bill and improve application performance today

How to fix OOMKilled in Kubernetes

CrashLoopBackoff Kubernetes: An Ultimate Guide

ImagePullBackOff: Troubleshooting Tips and Tricks

Latest Articles

GPU Optimization with Exceptional PerfectScale Visibility

On Demand Webinar: Manage & Scale GenAI on Kubernetes

GCP Cloud Billing with PerfectScale

About the author