OC Event Logging Troubleshooting

Mindwatering Incorporated

Author: Tripp W Black

Created: 05/20 at 09:19 PM

 

Category:
RH OpenShift
Troubleshooting

Task:
Troubleshooting failed/non-started pod requires using the oc get events command.


Sequence of Commands to Narrow Down an Issue with a Project:
Switch to project and view latest event logs for <project_name> namespace.
- Syntax:
$ oc project <project_name>
$ oc get events --sort-by='.lastTimestamp'

Get events again, but restricted to pod issues:
- Syntax:
$ oc get events --sort-by='.lastTimestamp' --field-selector involvedObject.kind=Pod

Get events again, but this time for a specific pod that appears to be the cause:
- Syntax:
$ oc get events --field-selector involvedObject.name=<project_pod_name>

Note:
- Instead of switching to a project, I could have included the namespace in the first query above.
- Syntax:
$ oc get events -n <project_name> --sort-by='.lastTimestamp'


Working with different severity levels of events:
Typically the issues are in two columns:
- The Type column, e.g. Warning (vs Info)
- The Reason column, e.g. Unhealthy.

Note:
- The --output wide flag allows additional columns/fields to be displayed and enables soft-wrap of otherwise truncated lines.

Filter to warnings only to find the needle-in-the-haystack:
- Syntax:
$ oc get events --field-selector type=Warning -n=<project_name>

Filter to warnings only as a cluster-wide scan:
- Syntax:
$ oc get events --field-selector type=Warning --all-namespaces --output wide

Filter the Reason column:
- Syntax:
$ oc get events --field-selector reason=<reason_words>

Note:
- Two very common reasons are Failed and Unhealthy


Watching the Failure:
Sometimes the best way to see it is to watch it happen. This is especially useful when there is a repeatable issue that happens every time X occurs. For example, if there is a pod start-up issue, you can watch the events live and see the issue occur as it happens.
- Syntax:
$ oc get events --watch=true -n=<project_name>


Failures in OpenShift Config:
$ oc get events -n openshift-config --output wide



Failures in the OpenShift Container Platform (OCP) Nodes By Component Type:
Source: OCP docs chapter 6.1.x

Container Events:
BackOffBack-off restarting failed the container.
CreatedContainer created.
FailedPull/Create/Start failed.
KillingKilling the container.
StartedContainer started.
PreemptingPreempting other pods.
ExceededGracePeriodContainer runtime did not stop the pod within specified grace period.
UnhealthyContainer is unhealthy.


Image Events:
BackOffBack off Ctr Start, image pull.
ErrImageNeverPullThe image’s NeverPull Policy is violated.
FailedFailed to pull the image.
InspectFailedFailed to inspect the image.
PulledSuccessfully pulled the image or the container image is already present on the machine.
PullingPulling the image.


Image Manager Events:
FreeDiskSpaceFailedFree disk space failed.
InvalidDiskCapacityInvalid disk capacity.


Pod Events:
FailedSyncPod sync failed.
FailedKillPodFailed to stop a pod.
FailedCreatePodContainerFailed to create a pod container.
FailedFailed to make pod data directories.
NetworkNotReadyNetwork is not ready.
FailedCreateError creating: <error-msg>.
SuccessfulCreateCreated pod: <pod-name>.
FailedDeleteError deleting: <error-msg>.
SuccessfulDeleteDeleted pod: <pod-id>.


Pod AutoScaler Events:
SelectorRequiredSelector is required.
InvalidSelectorCould not convert selector into a corresponding internal selector object.
FailedGetObjectMetricHPA was unable to compute the replica count.
InvalidMetricSourceTypeUnknown metric source type.
ValidMetricFoundHPA was able to successfully calculate a replica count.
FailedConvertHPAFailed to convert the given HPA.
FailedGetScaleHPA controller was unable to get the target’s current scale.
SucceededGetScaleHPA controller was able to get the target’s current scale.
FailedComputeMetricsReplicasFailed to compute desired number of replicas based on listed metrics.
FailedRescaleNew size: <size>; reason: <msg>; error: <error-msg>.
SuccessfulRescaleNew size: <size>; reason: <msg>.
FailedUpdateStatusFailed to update status.


Deployment Events:
DeploymentCancellationFailedFailed to cancel deployment.
DeploymentCancelledCanceled deployment.
DeploymentCreatedCreated new replication controller.
IngressIPRangeFullNo available Ingress IP to allocate to service.


Load Balancer Events:
CreatingLoadBalancerFailedError creating load balancer.
DeletingLoadBalancerDeleting load balancer.
EnsuringLoadBalancerEnsuring load balancer.
EnsuredLoadBalancerEnsured load balancer.
UnAvailableLoadBalancerThere are no available nodes for LoadBalancer service.
LoadBalancerSourceRangesLists the new LoadBalancerSourceRanges. For example, <old-source-range> 

 <new-source-range>.

LoadbalancerIPLists the new IP address. For example, <old-ip> 

 <new-ip>.

ExternalIPLists external IP address. For example, Added: <external-ip>.
UIDLists the new UID. For example, <old-service-uid> 

 <new-service-uid>.

ExternalTrafficPolicyLists the new ExternalTrafficPolicy. For example, <old-policy> 

 <new-policy>.

HealthCheckNodePortLists the new HealthCheckNodePort. For example, <old-node-port> 

 new-node-port>.

UpdatedLoadBalancerUpdated load balancer with new hosts.
LoadBalancerUpdateFailedError updating load balancer with new hosts.
DeletingLoadBalancerDeleting load balancer.
DeletingLoadBalancerFailedError deleting load balancer.
DeletedLoadBalancerDeleted load balancer.


Host / Node Events:
FailedMountVolume mount failed.
HostNetworkNotSupportedHost network not supported.
HostPortConflictHost/port conflict.
InsufficientFreeCPUInsufficient free CPU.
InsufficientFreeMemoryInsufficient free memory.
KubeletSetupFailedKubelet setup failed.
NilShaperUndefined shaper.
NodeNotReadyNode is not ready.
NodeNotSchedulableNode is not schedulable.
NodeReadyNode is ready.
NodeSchedulableNode is schedulable.
NodeSelectorMismatchingNode selector mismatch.
OutOfDiskOut of disk.
RebootedNode rebooted.
StartingStarting kubelet.
FailedAttachVolumeFailed to attach volume.
FailedDetachVolumeFailed to detach volume.
VolumeResizeFailedFailed to expand/reduce volume.
VolumeResizeSuccessfulSuccessfully expanded/reduced volume.
FileSystemResizeFailedFailed to expand/reduce file system.
FileSystemResizeSuccessfulSuccessfully expanded/reduced file system.
FailedUnMountFailed to unmount volume.
FailedMapVolumeFailed to map a volume.
FailedUnmapDeviceFailed unmapped device.
AlreadyMountedVolumeVolume is already mounted.
SuccessfulDetachVolumeVolume is successfully detached.
SuccessfulMountVolumeVolume is successfully mounted.
SuccessfulUnMountVolumeVolume is successfully unmounted.
ContainerGCFailedContainer garbage collection failed.
ImageGCFailedImage garbage collection failed.
FailedNodeAllocatableEnforcementFailed to enforce System Reserved Cgroup limit.
NodeAllocatableEnforcedEnforced System Reserved Cgroup limit.
UnsupportedMountOptionUnsupported mount option.
SandboxChangedPod sandbox changed.
FailedCreatePodSandBoxFailed to create pod sandbox.
FailedPodSandBoxStatusFailed pod sandbox status.


Volume Events:
FailedBindingThere are no persistent volumes available and no storage class is set.
VolumeMismatchVolume size or class is different from what is requested in claim.
VolumeFailedRecycleError creating recycler pod.
VolumeRecycledOccurs when volume is recycled.
RecyclerPodOccurs when pod is recycled.
VolumeDeleteOccurs when volume is deleted.
VolumeFailedDeleteError when deleting the volume.
ExternalProvisioningOccurs when volume for the claim is provisioned either manually or via external software.
ProvisioningFailedFailed to provision volume.
ProvisioningCleanupFailedError cleaning provisioned volume.
ProvisioningSucceededOccurs when the volume is provisioned successfully.
WaitForFirstConsumerDelay binding until pod scheduling.


Network Events:
StartingStarting OpenShift-SDN.
NetworkFailedopenshift-sdn - The pod’s network interface has been lost and the pod will be stopped.
NeedPodskube-proxy - The service-port <serviceName>:<port> needs pods.


Scheduler and Daemon Set Events with Placement:
FailedSchedulingFailed to schedule pod: <pod-namespace>/<pod-name>. This event is raised for multiple reasons, for example: AssumePodVolumes failed, Binding rejected etc.
PreemptedBy <preemptor-namespace>/<preemptor-name> on node <node-name>.
ScheduledSuccessfully assigned <pod-name> to <node-name>.
SelectingAllThis daemon set is selecting all pods. A non-empty selector is required.
FailedPlacementFailed to place pod on <node-name>.
FailedDaemonPodFound failed daemon pod <pod-name> on node <node-name>, will try to kill it.


Lifecycle Events:
FailedPostStartHookHandler failed for pod start.
FailedPreStopHookHandler failed for pre-stop.
UnfinishedPreStopHookPre-stop hook unfinished.


Cluster Events:
SystemOOMThere is an OOM (out of memory) situation on the cluster.


previous page

×