Kubernetes Events: How to View, Filter & Troubleshoot Them
TL;DR
Kubernetes Events are short-lived records that appear whenever something changes in your cluster — a pod starts, a container crashes, or a node runs out of room. They are stored for one hour by default, then deleted.
Here is what you need to know:
- Two types: Normal (things working as expected) and Warning (something went wrong)
- View them fast: Run
kubectl get eventsorkubectl describe pod <pod-name> - Filter by what matters: Use
--field-selector type=Warningto cut through the noise - They disappear quickly: Export events to a tool like Loki, Elasticsearch, or Splunk if you need them later
- Best for: Spotting the what and when of a problem — pair with logs and metrics to find the why
Common issues events catch: pods stuck in Pending, ImagePullBackOff, CrashLoopBackOff, and FailedScheduling.
What Are Kubernetes Events?
Kubernetes events are real-time, short-lived API resources generated whenever a state change, error, or meaningful activity occurs within your cluster. They are useful for debugging issues and auditing cluster health. Events capture details like what occurred, which object was affected, the time of occurrence, and a brief human-readable description.
Key characteristics:
- Types: Usually categorized as Normal (standard, business-as-usual changes) or Warning (indicating failures like back-off loops or scheduling issues).
- Retention: Events are ephemeral. They are only stored in etcd for up to one hour by default before they are garbage-collected.
- Core attributes: Each event includes Last Seen, Type, Reason, Object (e.g., Pod/web-server), and a Message detailing the state transition.
How to view events:
Use the built-in kubectl events command to inspect and monitor events directly from your terminal:
- View all recent events in the default namespace: kubectl events
- View events across all namespaces: kubectl events --all-namespaces
- Watch real-time events for a specific resource: kubectl events --for pod/<pod-name> --watch
- View recent events in YAML format: kubectl events -o yaml
This is part of a series of articles about Kubernetes troubleshooting
In this article:
- Key Characteristics of Kubernetes Events
- Kubernetes Events vs. Logs vs Metrics
- How to View Kubernetes Events
- How to Filter Kubernetes Events
- Common Kubernetes Event Types and Reasons
- Troubleshooting with Kubernetes Events
- Best Practices for Using Kubernetes Events
Key Characteristics of Kubernetes Events
1. Types
Kubernetes events are categorized mainly into two types: normal and warning:
- Normal events signify successful or expected operations, such as the creation of a pod or successful scheduling by the control plane. These events are informational and help users track the regular functioning of resources.
- Warning events indicate unexpected or problematic conditions, like scheduling failures or image pull errors. These warnings are crucial for quickly identifying and responding to issues before they escalate into larger outages or service disruptions.
Both types of events serve different purposes but are equally important for cluster observability. Normal events allow operators to confirm that resources are behaving as intended, while warning events act as an early warning system for errors or misconfigurations.
2. Retention
Kubernetes events are stored in the cluster's etcd datastore but are not retained indefinitely. By default, events are kept for one hour, although this retention period can be configured in the API server settings.
This short retention window is designed to minimize the performance impact on the cluster and prevent etcd from being overloaded with transient or repetitive event data. However, it also means that operators need to act quickly if they want to analyze or export events for further investigation.
Because events have limited retention, relying solely on the in-cluster event store for long-term auditing or post-mortem analysis is not recommended. For environments requiring longer event history, it is best practice to export events to an external logging or monitoring system.
3. Core Attributes
Each Kubernetes event contains several core attributes that provide context about what happened and where. Key attributes include:
- The involved object (such as a Pod or Node)
- The event type (Normal or Warning)
- A reason code that summarizes the cause
- A human-readable message
- Timestamps for when the event first and last occurred.
These structured attributes allow for easy filtering, searching, and correlation with other observability data. Understanding these attributes is crucial for effective troubleshooting. For example, the reason and message fields often provide immediate clues about underlying issues, while timestamps help determine the sequence of related events. The involved object reference allows users to drill down into the resource affected.
Kubernetes Events vs. Logs vs Metrics
Kubernetes Events, logs, and metrics each serve different purposes in cluster observability:
- Events capture high-level state changes and significant actions within Kubernetes resources, such as Pod scheduling or failures.
- Logs provide detailed, timestamped records of application or system activity, offering deep insight into what is happening inside containers or the Kubernetes components themselves.
- Metrics are numeric measurements collected over time, such as CPU usage or memory consumption, and are used for monitoring trends and triggering alerts.
While events are useful for tracing the lifecycle of cluster resources and identifying the cause of sudden changes, logs are better suited for debugging application behavior and diagnosing complex issues. Metrics enable operators to track resource health and performance at scale, supporting capacity planning and autoscaling.
How to View Kubernetes Events
Kubernetes provides several ways to view events directly from the command line using kubectl. The most common method is the kubectl get events command, which retrieves a list of events from the current namespace. This output includes details such as the event type, reason, involved object, and message.
To view events across all namespaces, users can add the --all-namespaces flag.
kubectl get events
kubectl get events --all-namespaces
Because events are time-sensitive, sorting them by timestamp is often useful during troubleshooting. Kubernetes allows sorting by creation time using the --sort-by option. This helps operators understand the sequence of actions that occurred before a failure or deployment issue.
kubectl get events --sort-by=.metadata.creationTimestamp
Users can also inspect events related to a specific resource with the kubectl describe command. For example, describing a Pod displays resource details along with a dedicated Events section at the bottom. This is one of the fastest ways to identify issues such as failed scheduling, container crashes, or image pull errors.
kubectl describe pod my-pod
In production environments, events are often integrated into centralized observability platforms such as Elasticsearch, Loki, or cloud-native monitoring tools. This enables longer retention, advanced searching, and correlation with logs and metrics, making troubleshooting more efficient at scale.
How to Filter Kubernetes Events
Filtering Kubernetes Events helps operators focus on the most relevant information during troubleshooting. Since clusters can generate large volumes of Events, filtering by namespace, object, type, or reason makes it easier to identify issues quickly.
1. The simplest approach is to limit results to a specific namespace.
kubectl get events -n production
2. Field selectors provide more advanced filtering capabilities. Users can filter Events by involved object name, kind, reason, or Event type. For example, the following command shows only Warning Events, which are typically associated with failures or abnormal conditions.
kubectl get events --field-selector type=Warning
3. To view events related to a specific pod, users can filter by the object name.
kubectl get events --field-selector involvedObject.name=my-pod
Filtering by reason is also useful when diagnosing recurring issues. For example, operators can search for scheduling failures across the cluster.
kubectl get events --field-selector reason=FailedScheduling
4. Kubernetes events can also be streamed in real time using the --watch flag. This is valuable during deployments or incident response because it continuously displays new Events as they occur.
kubectl get events --watch
For more advanced workflows, events are commonly exported to monitoring and logging platforms where users can apply complex queries, dashboards, and alerts. This allows teams to automate incident detection and correlate events with logs and metrics for faster root cause analysis.
Common Kubernetes Event Types and Reasons
Normal Events
Normal events represent expected or successful operations within the Kubernetes cluster. Examples include pod creation, successful image pulls, or nodes joining the cluster. These events provide confirmation that the control plane and resources are functioning as intended. By reviewing normal events, operators can verify that workflows, such as deployments or autoscaling, are progressing without errors and that the cluster is operating smoothly.
Importance:
Despite being informational, normal events still aid in troubleshooting and auditing. When investigating issues, confirming the presence or absence of normal events helps pinpoint where processes may have deviated from the expected path. For example, the absence of a “Scheduled” event during pod creation can indicate a scheduling problem, even if no warnings have been generated.
Warning Events
Warning events indicate problems or unexpected conditions in the Kubernetes cluster. These events signal issues such as failed scheduling, image pull errors, or resource constraint violations. Warning events are generated when the control plane or underlying components encounter situations that require attention but do not necessarily result in immediate resource failure.
Importance:
Monitoring warning events is crucial for proactive cluster management. Since these events highlight deviations from normal operation, they often serve as the first indicator of misconfigurations, resource shortages, or infrastructure problems.
Troubleshooting with Kubernetes Events
1. Pod Stuck in Pending
A pod remaining in the Pending state usually indicates that Kubernetes cannot place the pod onto a node or complete one of the initialization steps required before startup. Events are often the fastest way to identify the root cause because they provide direct feedback from the scheduler and kubelet. Common reasons include insufficient CPU or memory resources, missing PersistentVolumeClaims, node taints, or unsatisfied affinity rules.
Operators can inspect events related to the pod using the following command:
kubectl describe pod my-pod
In the Events section, Kubernetes may display messages such as FailedScheduling with details explaining why no suitable Node was found. For example, an event may indicate that nodes lack available resources or that taints prevent scheduling. These messages help narrow troubleshooting efforts quickly without requiring deep inspection of scheduler logs.
Another useful approach is filtering cluster-wide scheduling Events:
kubectl get events --field-selector reason=FailedScheduling
By reviewing these events, operators can determine whether the issue is isolated to a single workload or affects multiple pods across the cluster. Resolving Pending Pods often involves scaling cluster resources, adjusting resource requests, fixing storage dependencies, or updating scheduling constraints.
2. ImagePullBackOff or ErrImagePull
ImagePullBackOff and ErrImagePull events occur when Kubernetes cannot download a container image from a registry. These issues are commonly caused by incorrect image names, missing tags, authentication failures, or network connectivity problems. Events provide detailed messages explaining why the image pull failed, making them essential for diagnosing deployment problems.
The fastest way to investigate is by describing the affected Pod:
kubectl describe pod my-pod
The Events section often includes messages such as Failed to pull image or Back-off pulling image. These messages may reveal errors like invalid image references, denied registry access, or missing credentials. If the registry requires authentication, Kubernetes may also indicate that the configured image pull secret is invalid or unavailable.
Operators should verify the image name and tag in the pod specification, confirm registry accessibility, and ensure that required secrets exist in the correct namespace. Once the issue is corrected, Kubernetes automatically retries pulling the image and starts the container successfully.
3. CrashLoopBackOff
A CrashLoopBackOff event indicates that a container starts successfully but repeatedly crashes shortly afterward. Kubernetes continuously attempts to restart the container, increasing the delay between retries after each failure. This condition is commonly caused by application errors, invalid configuration, missing dependencies, or failed health checks.
Events help identify the restart pattern and associated failures:
kubectl describe pod my-pod
The output may show events such as Back-off restarting failed container. While events reveal the restart behavior, container logs usually provide the detailed cause of the crash.
Operators often combine Event analysis with log inspection:
kubectl logs my-pod
Frequent restart events can also indicate failed liveness probes or resource exhaustion issues such as out-of-memory kills. In these cases, reviewing probe configurations and container resource limits is important.
4. FailedScheduling
FailedScheduling events occur when the Kubernetes scheduler cannot assign a pod to any available node. These events are among the most common warning events in production clusters and typically indicate resource shortages or restrictive scheduling rules. The scheduler generates detailed messages describing why placement failed.
Operators can view scheduling-related events with:
kubectl get events --field-selector reason=FailedScheduling
Typical messages include insufficient CPU or memory, node affinity mismatches, taint conflicts, or volume topology restrictions. For example, an event may report that no nodes satisfy the pod's resource requests or that all nodes are marked with taints the pod cannot tolerate.
Understanding the event message is critical because FailedScheduling is a symptom rather than the root cause itself. Resolving the issue may involve adding cluster capacity, modifying resource requests, updating affinity rules, or configuring tolerations correctly. Since scheduling problems can affect many workloads simultaneously, monitoring these Events helps operators detect cluster-wide capacity or configuration issues early.
Best Practices for Using Kubernetes Events
Here are some useful events-related practices to consider when using Kubernetes.
1. Check Events Early During Troubleshooting
Kubernetes events should be one of the first places to look when diagnosing cluster or application issues. Events provide insight into recent state changes, failures, and control plane actions.
Commands such as kubectl describe or kubectl get events --sort-by=.metadata.creationTimestamp can expose scheduling failures, container restarts, or image pull issues. Reviewing events early helps identify whether the problem originates from Kubernetes infrastructure, workload configuration, or the application itself.
Because events are chronological, they help reconstruct the sequence of actions leading up to a failure. This is useful during deployments, rollouts, or incident response.
2. Combine Events with Logs and Metrics
Events provide context, but they are most useful when used together with logs and metrics. Events explain what happened at the Kubernetes resource level, while logs reveal detailed application or component behavior. Metrics add performance and resource utilization data over time.
For example, a CrashLoopBackOff event may indicate repeated container restarts, but application logs are usually needed to determine why the container crashed. Metrics may reveal memory exhaustion or CPU throttling that contributed to the failure. Correlating these signals helps operators move from symptom detection to root cause analysis.
Observability platforms often integrate events, logs, and metrics into a unified dashboard. This allows teams to trace issues across multiple layers of the stack.
3. Export Events for Retention
Kubernetes events are temporary by design and are typically retained for only a short period in etcd. Because of this limited retention, diagnostic information can disappear quickly after an incident occurs. Exporting events to an external system ensures that historical event data remains available for analysis and auditing.
For example, many organizations forward events to centralized observability platforms such as Elasticsearch, Loki, Splunk, or cloud-native monitoring services. These systems provide longer retention, advanced querying, dashboards, and correlation with logs and metrics.
Retaining event history is useful for post-mortem investigations and identifying recurring operational patterns.
4. Alert Only on High-Signal Events
Not all Kubernetes events require alerts. Large clusters generate high volumes of informational events, and alerting on every event can create noise and lead to alert fatigue. Instead, focus alerts on high-signal warning events that indicate operational problems or service risk.
Examples include repeated FailedScheduling, CrashLoopBackOff, ImagePullBackOff, or node-related Warning events. Filtering alerts based on event type, reason, frequency, or affected resources helps reduce unnecessary notifications.
Effective event alerting should prioritize actionable signals over raw event volume.
5. Avoid Depending on Exact Event Messages
Kubernetes event messages are designed for human readability and may change between Kubernetes versions or implementations. Depending on exact message text in scripts, automation, or monitoring rules can create fragile workflows that break after upgrades or platform changes.
Instead of matching full event messages, rely on structured fields such as the event reason, type, involved object, or labels. These fields are more stable and intended for programmatic filtering and automation. For example, matching the FailedScheduling reason is more reliable than searching for a scheduler error string.
Going Beyond Events: Detecting and Resolving Kubernetes Issues with PerfectScale
Kubernetes Events tell you what went wrong, but acting on them across a large environment still means constant manual triage. PerfectScale by DoiT closes that gap by autonomously detecting and remediating the resiliency and performance issues that Events surface — such as OOM kills, CPU throttling, evictions, and repeated pod restarts — while continuously right-sizing workloads to keep clusters stable and deliver up to 99.99% availability.
Key capabilities of PerfectScale:
- Automatic issue remediation: Instantly identifies and fixes resiliency risks like out-of-memory kills, CPU throttling, evictions, and pod restarts to maximize uptime and eliminate latency.
- Configuration error prevention: Catches misconfigurations such as missing CPU and memory requests or limits, suspected memory leaks, and workloads hitting their maximum replica count before they trigger incidents.
- Infrastructure hardening: Provides holistic visibility across your nodes to proactively spot problems like node over-commitment and improper node affinities or taints that lead to evictions and failed scheduling.
- Impact-driven prioritization: Ranks issues in real time and aligns alerting with your SLAs and SLOs, so teams focus on the problems that most affect service consistency.
- Integrated alerting and ticketing: Sends instant notifications through Slack, MS Teams, or Datadog and lets you escalate any issue into a ticket in a single click.
Learn more about how PerfectScale autonomously boosts your Kubernetes resilience and performance.
Frequently Asked Questions
What are Kubernetes Events?
Kubernetes Events are real-time API objects that Kubernetes creates whenever something changes in your cluster. That includes normal activity like a pod being scheduled, and problems like a container crashing or a node running low on memory. Each event records what happened, which resource was affected, when it occurred, and a short description of the cause.
How long do Kubernetes Events last?
By default, Kubernetes keeps events for one hour. After that, they are automatically removed from etcd, the cluster's data store. If you need events to last longer — for audits, post-mortems, or trend analysis — you need to export them to an external system like Elasticsearch, Loki, or Splunk.
What is the difference between Normal and Warning events?
Normal events confirm that things are working as expected. Examples include a pod being created or an image being pulled successfully.
Warning events mean something went wrong or is at risk of going wrong. Examples include FailedScheduling, CrashLoopBackOff, ImagePullBackOff, and out-of-memory kills. Warning events are the ones to pay close attention to during troubleshooting.
How do I view Kubernetes Events?
The fastest way is to run one of these kubectl commands:
- All events in the current namespace:
kubectl get events - Events across every namespace:
kubectl get events --all-namespaces - Events sorted by time:
kubectl get events --sort-by=.metadata.creationTimestamp - Events for a specific pod:
kubectl describe pod <pod-name> - Watch events in real time:
kubectl get events --watch
