# k8s 应用程序自检和调试
一旦你的应用程序运行起来了,你将不可避免地需要对它进行调试。 之前我们介绍过如何使用 kubectl get pod 来检索有关您的 pod 的简单状态信息。但还有很多方法可以获得有关应用程序的更多信息。
# 使用 kubectl describe pod 来获取有关 pod 的详细信息
在这个例子中,我们将使用 Deployment 来创建两个 pod,与前面的示例类似。
nginx-dep.yaml |
---|
apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment spec: selector: matchLabels: app: nginx replicas: 2 template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx resources: limits: memory: "128Mi" cpu: "500m" ports: - containerPort: 80 |
使用如下命令来创建 deployment:
$ kubectl create -f https://k8s.io/docs/tasks/debug-application-cluster/nginx-dep.yaml
deployment "nginx-deployment" created
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-deployment-1006230814-6winp 1/1 Running 0 11s
nginx-deployment-1006230814-fmgu3 1/1 Running 0 11s
我们可以使用 kubectl describe pod 获取每个 pod 的更多信息。例如:
$ kubectl describe pod nginx-deployment-1006230814-6winp
Name: nginx-deployment-1006230814-6winp
Namespace: default
Node: kubernetes-node-wul5/10.240.0.9
Start Time: Thu, 24 Mar 2016 01:39:49 +0000
Labels: app=nginx,pod-template-hash=1006230814
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind" :"ReplicaSet","namespace":"default","name":"nginx-deployment-1956810328","uid":"14e607e7-8ba1-11e7-b5cb-fa16" ...
Status: Running
IP: 10.244.0.6
Controllers: ReplicaSet/nginx-deployment-1006230814
Containers:
nginx:
Container ID: docker://90315cc9f513c724e9957a4788d3e625a078de84750f244a40f97ae355eb1149
Image: nginx
Image ID: docker://6f62f48c4e55d700cf3eb1b5e33fa051802986b77b874cc351cce539e5163707
Port: 80/TCP
QoS Tier:
cpu: Guaranteed
memory: Guaranteed
Limits:
cpu: 500m
memory: 128Mi
Requests:
memory: 128Mi
cpu: 500m
State: Running
Started: Thu, 24 Mar 2016 01:39:51 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-5kdvl (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-4bcbi:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-4bcbi
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
54s 54s 1 {default-scheduler } Normal Scheduled Successfully assigned nginx-deployment-1006230814-6winp to kubernetes-node-wul5
54s 54s 1 {kubelet kubernetes-node-wul5} spec.containers{nginx} Normal Pulling pulling image "nginx"
53s 53s 1 {kubelet kubernetes-node-wul5} spec.containers{nginx} Normal Pulled Successfully pulled image "nginx"
53s 53s 1 {kubelet kubernetes-node-wul5} spec.containers{nginx} Normal Created Created container with docker id 90315cc9f513
53s 53s 1 {kubelet kubernetes-node-wul5} spec.containers{nginx} Normal Started Started container with docker id 90315cc9f513
在这里您可以看到有关容器和 Pod 的配置信息(标签,资源需求等),以及有关容器和 Pod 的状态信息(状态,准备情况,重新启动次数,事件等)。
容器状态是 Waiting,Running 或 Terminated 之一。根据状态,可以获得更多信息 – 在这里您可以看到,对于处于运行状态的容器,系统会告诉您何时启动的容器。
Ready 告诉您容器是否通过了最后一次准备就绪探测。(在这种情况下,容器没有配置就绪探针;如果未配置准备就绪探针,则假定容器已准备就绪。)
重启数量会告诉您容器重新启动的次数; 此信息可用于检测重启策略为 ‘always’ 的容器的循环崩溃。
目前,与 Pod 相关的唯一条件是二进制 Ready 状态,这表明该 Pod 可以处理请求,并且应该添加到所有匹配服务的负载均衡池中。
最后,您会看到与您的 Pod 有关的最近事件日志。系统压缩多个相同的事件,只显示第一次和最后一次出现的时间以及出现的次数。”From” 表示记录事件的组件,”SubobjectPath” 告诉您哪个对象(例如容器内的容器)被引用,”Reason” 和 “Message” 告诉您发生了什么。
# 示例:调试 Pending 状态的 Pod
通过事件排查的一种常见情况是创建了不适合任何节点的 Pod。例如,Pod 可能会请求比任何节点上的空闲资源更多的资源,或者可能会指定一个不匹配任何节点的标签选择器。 假设我们在上面的 Deployment 例子中创建 5 个 replicas(而不是 2 个),并请求 600 millicores 而不是 500 millicores,集群拥有 4 个节点,每个(虚拟)机器有 1 个 CPU。 在这种情况下,其中一个 Pod 将无法调度。(请注意,由于在每个节点上运行了集群附加 pod,例如 fluentd 和 skydns 等,如果我们请求 1000 millicores,则没有任何一个 pod 可以成功调度。)
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-deployment-1006230814-6winp 1/1 Running 0 7m
nginx-deployment-1006230814-fmgu3 1/1 Running 0 7m
nginx-deployment-1370807587-6ekbw 1/1 Running 0 1m
nginx-deployment-1370807587-fg172 0/1 Pending 0 1m
nginx-deployment-1370807587-fz9sd 0/1 Pending 0 1m
要找出 nginx-deployment-1370807587-fz9sd pod 未运行的原因,我们可以在待处理的 Pod 上使用 kubectl describe pod 并查看其事件:
$ kubectl describe pod nginx-deployment-1370807587-fz9sd
Name: nginx-deployment-1370807587-fz9sd
Namespace: default
Node: /
Labels: app=nginx,pod-template-hash=1370807587
Status: Pending
IP:
Controllers: ReplicaSet/nginx-deployment-1370807587
Containers:
nginx:
Image: nginx
Port: 80/TCP
QoS Tier:
memory: Guaranteed
cpu: Guaranteed
Limits:
cpu: 1
memory: 128Mi
Requests:
cpu: 1
memory: 128Mi
Environment Variables:
Volumes:
default-token-4bcbi:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-4bcbi
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1m 48s 7 {default-scheduler } Warning FailedScheduling pod (nginx-deployment-1370807587-fz9sd) failed to fit in any node
fit failure on node (kubernetes-node-6ta5): Node didn't have enough resource: CPU, requested: 1000, used: 1420, capacity: 2000
fit failure on node (kubernetes-node-wul5): Node didn't have enough resource: CPU, requested: 1000, used: 1100, capacity: 2000
在这里,您可以看到 scheduler 生成的事件,表明由于 FailedScheduling(可能还有其他原因),Pod 无法调度。该消息告诉我们没有任何节点能够满足 Pod 的需求。
要解决这种情况,可以使用 kubectl scale 来更新您的部署以指定 4 个或更少的 replicas。(或者您可以让一个 Pod 保持 pending,这是无害的。)
在 etcd 中存储了类似于 kubectl describe pod 结尾处看到的事件,并提供有关集群中正在发生的事情的高级信息。您可以使用如下命令列出所有事件:
kubectl get events
但是您需要记住事件是具有命名空间的。这意味着如果您对某些命名空间对象的事件感兴趣(例如,命名空间 my-namespace 中的 Pod 发生了什么),则需要明确地为命令提供一个命名空间:
kubectl get events --namespace=my-namespace
要查看来自所有命名空间的事件,可以使用 --all-namespaces 参数。
除 kubectl describe pod 之外,另一种获得关于 pod 额外信息的方法(超出了 kubectl get pod 提供的内容)是将 -o yaml 输出格式标志传递给 kubectl get pod。 这会给你 YAML 格式的信息,甚至比 kubectl describe pod 更多的信息 – 基本上是系统拥有的 Pod 的所有信息。 在这里,您将看到类似注解(这是没有标签限制的键值元数据,给 k8s 系统组件内部使用)、重新启动策略、端口和卷。
$ kubectl get pod nginx-deployment-1006230814-6winp -o yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/created-by: |
{"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"nginx-deployment-1006230814","uid":"4c84c175-f161-11e5-9a78-42010af00005","apiVersion":"extensions","resourceVersion":"133434"}}
creationTimestamp: 2016-03-24T01:39:50Z
generateName: nginx-deployment-1006230814-
labels:
app: nginx
pod-template-hash: "1006230814"
name: nginx-deployment-1006230814-6winp
namespace: default
resourceVersion: "133447"
selfLink: /api/v1/namespaces/default/pods/nginx-deployment-1006230814-6winp
uid: 4c879808-f161-11e5-9a78-42010af00005
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
ports:
- containerPort: 80
protocol: TCP
resources:
limits:
cpu: 500m
memory: 128Mi
requests:
cpu: 500m
memory: 128Mi
terminationMessagePath: /dev/termination-log
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-4bcbi
readOnly: true
dnsPolicy: ClusterFirst
nodeName: kubernetes-node-wul5
restartPolicy: Always
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
volumes:
- name: default-token-4bcbi
secret:
secretName: default-token-4bcbi
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2016-03-24T01:39:51Z
status: "True"
type: Ready
containerStatuses:
- containerID: docker://90315cc9f513c724e9957a4788d3e625a078de84750f244a40f97ae355eb1149
image: nginx
imageID: docker://6f62f48c4e55d700cf3eb1b5e33fa051802986b77b874cc351cce539e5163707
lastState: {}
name: nginx
ready: true
restartCount: 0
state:
running:
startedAt: 2016-03-24T01:39:51Z
hostIP: 10.240.0.9
phase: Running
podIP: 10.244.0.6
startTime: 2016-03-24T01:39:49Z
# 示例:调试一个关闭(或者无法到达)的节点
有时,在调试时,查看节点的状态可能很有用 – 例如,您已经注意到节点上运行的 Pod 的奇怪行为,或想查明 Pod 不调度到节点上的原因。与 Pod 一样,可以使用 kubectl describe node 和 kubectl get node -o yaml 来检索有关节点的详细信息。例如,如果某个节点关闭(从网络断开连接,或 kubelet 死亡并不会重新启动等),您将看到以下内容。 注意显示节点为 NotReady 的事件,并且还注意到 Pod 不再运行(它们在 NotReady 状态五分钟后被驱逐)。
$ kubectl get nodes
NAME STATUS AGE VERSION
kubernetes-node-861h NotReady 1h v1.6.0+fff5156
kubernetes-node-bols Ready 1h v1.6.0+fff5156
kubernetes-node-st6x Ready 1h v1.6.0+fff5156
kubernetes-node-unaj Ready 1h v1.6.0+fff5156
$ kubectl describe node kubernetes-node-861h
Name: kubernetes-node-861h
Role
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=kubernetes-node-861h
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: <none>
CreationTimestamp: Mon, 04 Sep 2017 17:13:23 +0800
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk Unknown Fri, 08 Sep 2017 16:04:28 +0800 Fri, 08 Sep 2017 16:20:58 +0800 NodeStatusUnknown Kubelet stopped posting node status.
MemoryPressure Unknown Fri, 08 Sep 2017 16:04:28 +0800 Fri, 08 Sep 2017 16:20:58 +0800 NodeStatusUnknown Kubelet stopped posting node status.
DiskPressure Unknown Fri, 08 Sep 2017 16:04:28 +0800 Fri, 08 Sep 2017 16:20:58 +0800 NodeStatusUnknown Kubelet stopped posting node status.
Ready Unknown Fri, 08 Sep 2017 16:04:28 +0800 Fri, 08 Sep 2017 16:20:58 +0800 NodeStatusUnknown Kubelet stopped posting node status.
Addresses: 10.240.115.55,104.197.0.26
Capacity:
cpu: 2
hugePages: 0
memory: 4046788Ki
pods: 110
Allocatable:
cpu: 1500m
hugePages: 0
memory: 1479263Ki
pods: 110
System Info:
Machine ID: 8e025a21a4254e11b028584d9d8b12c4
System UUID: 349075D1-D169-4F25-9F2A-E886850C47E3
Boot ID: 5cd18b37-c5bd-4658-94e0-e436d3f110e0
Kernel Version: 4.4.0-31-generic
OS Image: Debian GNU/Linux 8 (jessie)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.12.5
Kubelet Version: v1.6.9+a3d1dfa6f4335
Kube-Proxy Version: v1.6.9+a3d1dfa6f4335
ExternalID: 15233045891481496305
Non-terminated Pods: (9 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
......
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
900m (60%) 2200m (146%) 1009286400 (66%) 5681286400 (375%)
Events: <none>
$ kubectl get node kubernetes-node-861h -o yaml
apiVersion: v1
kind: Node
metadata:
creationTimestamp: 2015-07-10T21:32:29Z
labels:
kubernetes.io/hostname: kubernetes-node-861h
name: kubernetes-node-861h
resourceVersion: "757"
selfLink: /api/v1/nodes/kubernetes-node-861h
uid: 2a69374e-274b-11e5-a234-42010af0d969
spec:
externalID: "15233045891481496305"
podCIDR: 10.244.0.0/24
providerID: gce://striped-torus-760/us-central1-b/kubernetes-node-861h
status:
addresses:
- address: 10.240.115.55
type: InternalIP
- address: 104.197.0.26
type: ExternalIP
capacity:
cpu: "1"
memory: 3800808Ki
pods: "100"
conditions:
- lastHeartbeatTime: 2015-07-10T21:34:32Z
lastTransitionTime: 2015-07-10T21:35:15Z
reason: Kubelet stopped posting node status.
status: Unknown
type: Ready
nodeInfo:
bootID: 4e316776-b40d-4f78-a4ea-ab0d73390897
containerRuntimeVersion: docker://Unknown
kernelVersion: 3.16.0-0.bpo.4-amd64
kubeProxyVersion: v0.21.1-185-gffc5a86098dc01
kubeletVersion: v0.21.1-185-gffc5a86098dc01
machineID: ""
osImage: Debian GNU/Linux 7 (wheezy)
systemUUID: ABE5F6B4-D44B-108B-C46A-24CCE16C8B6E
← k8s 审计 kubectl 备忘单 →