OC Event Logging Troubleshooting

Mindwatering Incorporated

Author: Tripp W Black

Created: 05/20 at 09:19 PM

Category:
RH OpenShift
Troubleshooting

Task:
Troubleshooting failed/non-started pod requires using the oc get events command.

Sequence of Commands to Narrow Down an Issue with a Project:
Switch to project and view latest event logs for <project_name> namespace.
- Syntax:
$ oc project <project_name>
$ oc get events --sort-by='.lastTimestamp'

Get events again, but restricted to pod issues:
- Syntax:
$ oc get events --sort-by='.lastTimestamp' --field-selector involvedObject.kind=Pod

Get events again, but this time for a specific pod that appears to be the cause:
- Syntax:
$ oc get events --field-selector involvedObject.name=<project_pod_name>

Note:
- Instead of switching to a project, I could have included the namespace in the first query above.
- Syntax:
$ oc get events -n <project_name> --sort-by='.lastTimestamp'

Working with different severity levels of events:
Typically the issues are in two columns:
- The Type column, e.g. Warning (vs Info)
- The Reason column, e.g. Unhealthy.

Note:
- The --output wide flag allows additional columns/fields to be displayed and enables soft-wrap of otherwise truncated lines.

Filter to warnings only to find the needle-in-the-haystack:
- Syntax:
$ oc get events --field-selector type=Warning -n=<project_name>

Filter to warnings only as a cluster-wide scan:
- Syntax:
$ oc get events --field-selector type=Warning --all-namespaces --output wide

Filter the Reason column:
- Syntax:
$ oc get events --field-selector reason=<reason_words>

Note:
- Two very common reasons are Failed and Unhealthy

Watching the Failure:
Sometimes the best way to see it is to watch it happen. This is especially useful when there is a repeatable issue that happens every time X occurs. For example, if there is a pod start-up issue, you can watch the events live and see the issue occur as it happens.
- Syntax:
$ oc get events --watch=true -n=<project_name>

Failures in OpenShift Config:
$ oc get events -n openshift-config --output wide

Failures in the OpenShift Container Platform (OCP) Nodes By Component Type:
Source: OCP docs chapter 6.1.x

Container Events:

BackOff	Back-off restarting failed the container.
Created	Container created.
Failed	Pull/Create/Start failed.
Killing	Killing the container.
Started	Container started.
Preempting	Preempting other pods.
ExceededGracePeriod	Container runtime did not stop the pod within specified grace period.
Unhealthy	Container is unhealthy.

Image Events:

BackOff	Back off Ctr Start, image pull.
ErrImageNeverPull	The image’s NeverPull Policy is violated.
Failed	Failed to pull the image.
InspectFailed	Failed to inspect the image.
Pulled	Successfully pulled the image or the container image is already present on the machine.
Pulling	Pulling the image.

Image Manager Events:

FreeDiskSpaceFailed	Free disk space failed.
InvalidDiskCapacity	Invalid disk capacity.

Pod Events:

FailedSync	Pod sync failed.
FailedKillPod	Failed to stop a pod.
FailedCreatePodContainer	Failed to create a pod container.
Failed	Failed to make pod data directories.
NetworkNotReady	Network is not ready.
FailedCreate	Error creating: <error-msg>.
SuccessfulCreate	Created pod: <pod-name>.
FailedDelete	Error deleting: <error-msg>.
SuccessfulDelete	Deleted pod: <pod-id>.

Pod AutoScaler Events:

SelectorRequired	Selector is required.
InvalidSelector	Could not convert selector into a corresponding internal selector object.
FailedGetObjectMetric	HPA was unable to compute the replica count.
InvalidMetricSourceType	Unknown metric source type.
ValidMetricFound	HPA was able to successfully calculate a replica count.
FailedConvertHPA	Failed to convert the given HPA.
FailedGetScale	HPA controller was unable to get the target’s current scale.
SucceededGetScale	HPA controller was able to get the target’s current scale.
FailedComputeMetricsReplicas	Failed to compute desired number of replicas based on listed metrics.
FailedRescale	New size: <size>; reason: <msg>; error: <error-msg>.
SuccessfulRescale	New size: <size>; reason: <msg>.
FailedUpdateStatus	Failed to update status.

Deployment Events:

DeploymentCancellationFailed	Failed to cancel deployment.
DeploymentCancelled	Canceled deployment.
DeploymentCreated	Created new replication controller.
IngressIPRangeFull	No available Ingress IP to allocate to service.

Load Balancer Events:

CreatingLoadBalancerFailed	Error creating load balancer.
DeletingLoadBalancer	Deleting load balancer.
EnsuringLoadBalancer	Ensuring load balancer.
EnsuredLoadBalancer	Ensured load balancer.
UnAvailableLoadBalancer	There are no available nodes for LoadBalancer service.
LoadBalancerSourceRanges	Lists the new LoadBalancerSourceRanges. For example, <old-source-range> <new-source-range>.
LoadbalancerIP	Lists the new IP address. For example, <old-ip> <new-ip>.
ExternalIP	Lists external IP address. For example, Added: <external-ip>.
UID	Lists the new UID. For example, <old-service-uid> <new-service-uid>.
ExternalTrafficPolicy	Lists the new ExternalTrafficPolicy. For example, <old-policy> <new-policy>.
HealthCheckNodePort	Lists the new HealthCheckNodePort. For example, <old-node-port> new-node-port>.
UpdatedLoadBalancer	Updated load balancer with new hosts.
LoadBalancerUpdateFailed	Error updating load balancer with new hosts.
DeletingLoadBalancer	Deleting load balancer.
DeletingLoadBalancerFailed	Error deleting load balancer.
DeletedLoadBalancer	Deleted load balancer.

Host / Node Events:

FailedMount	Volume mount failed.
HostNetworkNotSupported	Host network not supported.
HostPortConflict	Host/port conflict.
InsufficientFreeCPU	Insufficient free CPU.
InsufficientFreeMemory	Insufficient free memory.
KubeletSetupFailed	Kubelet setup failed.
NilShaper	Undefined shaper.
NodeNotReady	Node is not ready.
NodeNotSchedulable	Node is not schedulable.
NodeReady	Node is ready.
NodeSchedulable	Node is schedulable.
NodeSelectorMismatching	Node selector mismatch.
OutOfDisk	Out of disk.
Rebooted	Node rebooted.
Starting	Starting kubelet.
FailedAttachVolume	Failed to attach volume.
FailedDetachVolume	Failed to detach volume.
VolumeResizeFailed	Failed to expand/reduce volume.
VolumeResizeSuccessful	Successfully expanded/reduced volume.
FileSystemResizeFailed	Failed to expand/reduce file system.
FileSystemResizeSuccessful	Successfully expanded/reduced file system.
FailedUnMount	Failed to unmount volume.
FailedMapVolume	Failed to map a volume.
FailedUnmapDevice	Failed unmapped device.
AlreadyMountedVolume	Volume is already mounted.
SuccessfulDetachVolume	Volume is successfully detached.
SuccessfulMountVolume	Volume is successfully mounted.
SuccessfulUnMountVolume	Volume is successfully unmounted.
ContainerGCFailed	Container garbage collection failed.
ImageGCFailed	Image garbage collection failed.
FailedNodeAllocatableEnforcement	Failed to enforce System Reserved Cgroup limit.
NodeAllocatableEnforced	Enforced System Reserved Cgroup limit.
UnsupportedMountOption	Unsupported mount option.
SandboxChanged	Pod sandbox changed.
FailedCreatePodSandBox	Failed to create pod sandbox.
FailedPodSandBoxStatus	Failed pod sandbox status.

Volume Events:

FailedBinding	There are no persistent volumes available and no storage class is set.
VolumeMismatch	Volume size or class is different from what is requested in claim.
VolumeFailedRecycle	Error creating recycler pod.
VolumeRecycled	Occurs when volume is recycled.
RecyclerPod	Occurs when pod is recycled.
VolumeDelete	Occurs when volume is deleted.
VolumeFailedDelete	Error when deleting the volume.
ExternalProvisioning	Occurs when volume for the claim is provisioned either manually or via external software.
ProvisioningFailed	Failed to provision volume.
ProvisioningCleanupFailed	Error cleaning provisioned volume.
ProvisioningSucceeded	Occurs when the volume is provisioned successfully.
WaitForFirstConsumer	Delay binding until pod scheduling.

Network Events:

Starting	Starting OpenShift-SDN.
NetworkFailed	openshift-sdn - The pod’s network interface has been lost and the pod will be stopped.
NeedPods	kube-proxy - The service-port <serviceName>:<port> needs pods.

Scheduler and Daemon Set Events with Placement:

FailedScheduling	Failed to schedule pod: <pod-namespace>/<pod-name>. This event is raised for multiple reasons, for example: AssumePodVolumes failed, Binding rejected etc.
Preempted	By <preemptor-namespace>/<preemptor-name> on node <node-name>.
Scheduled	Successfully assigned <pod-name> to <node-name>.
SelectingAll	This daemon set is selecting all pods. A non-empty selector is required.
FailedPlacement	Failed to place pod on <node-name>.
FailedDaemonPod	Found failed daemon pod <pod-name> on node <node-name>, will try to kill it.

Lifecycle Events:

FailedPostStartHook	Handler failed for pod start.
FailedPreStopHook	Handler failed for pre-stop.
UnfinishedPreStopHook	Pre-stop hook unfinished.

Cluster Events:

SystemOOM

There is an OOM (out of memory) situation on the cluster.