Kubernetes cluster does not run after reboot

윤태일

If I use the kubectl command after a reboot, I will receive an error. x.x.x.x: 6443 was refused-did you specify the right host or port?

If I check my container with docker ps, kube-apiserver and kube-scheduler are turned on and off.

Why is this happening?

root@taeil-linux:/etc/systemd/system/kubelet.service.d# cd
root@taeil-linux:~# kubectl get nodes
The connection to the server 10.0.0.152:6443 was refused - did you     specify the right host or port?
root@taeil-linux:~# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED                 STATUS              PORTS               NAMES
root@taeil-linux:~# docker images
REPOSITORY                           TAG                 IMAGE ID                CREATED             SIZE
k8s.gcr.io/kube-proxy                v1.15.3                 232b5c793146        2 weeks ago         82.4MB
k8s.gcr.io/kube-apiserver            v1.15.3                 5eb2d3fc7a44        2 weeks ago         207MB
k8s.gcr.io/kube-scheduler            v1.15.3                 703f9c69a5d5        2 weeks ago         81.1MB
k8s.gcr.io/kube-controller-manager   v1.15.3                 e77c31de5547        2 weeks ago         159MB
node                                 carbon                  c83f74dcf58e        3 weeks ago         895MB
kubernetesui/dashboard               v2.0.0-beta1            4640949a39e6        2 months ago        64.6MB
weaveworks/weave-kube                2.5.2                   f04a043bb67a        3 months ago        148MB
weaveworks/weave-npc                 2.5.2                   5ce48e0d813c        3 months ago        49.6MB
kubernetesui/metrics-scraper         v1.0.0                  44390ebe2b73        4 months ago        36.8MB
k8s.gcr.io/coredns                   1.3.1                   eb516548c180        7 months ago        40.3MB
k8s.gcr.io/etcd                      3.3.10                  2c4adeb21b4f        9 months ago        258MB
quay.io/coreos/flannel               v0.10.0-amd64           f0fad859c909        19 months ago       44.6MB
k8s.gcr.io/pause                     3.1                     da86e6ba6ca1        20 months ago       742kB

root@taeil-linux:~# systemctl status kubelet

● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/lib/systemd/system/kubelet.service; enabled;     vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Fri 2019-09-06 14:29:25 KST; 4min     19s ago
     Docs: https://kubernetes.io/docs/home/
 Main PID: 14470 (kubelet)
    Tasks: 19 (limit: 4512)
   CGroup: /system.slice/kubelet.service
           └─14470 /usr/bin/kubelet --bootstrap-    kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --    kubeconfig=/etc/kubernetes/kubelet.conf --    config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network-    plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1 --resolv-con

 9월 06 14:33:44 taeil-linux kubelet[14470]: E0906 14:33:44.800330       14470 pod_workers.go:190] Error syncing pod     9a745ac0a776afabd0d387fd0fcb2f54 ("kube-apiserver-taeil-linux_kube-    system(9a745ac0a776afabd0d387fd0fcb2f54)"), skipping: failed to     "CreatePodSandbox" for "kube-apiserver-ta
 9월 06 14:33:44 taeil-linux kubelet[14470]: E0906 14:33:44.897945       14470 kubelet.go:2248] node "taeil-linux" not found
 9월 06 14:33:44 taeil-linux kubelet[14470]: E0906 14:33:44.916566       14470 reflector.go:125]     k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list     *v1.Pod: Get https://10.0.0.152:6443/api/v1/pods?    fieldSelector=spec.nodeName%3Dtaeil-linux&limit=500&resourceVersion=0:     dia
 9월 06 14:33:44 taeil-linux kubelet[14470]: E0906 14:33:44.998190       14470 kubelet.go:2248] node "taeil-linux" not found
 9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.098439       14470 kubelet.go:2248] node "taeil-linux" not found
 9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.198732       14470 kubelet.go:2248] node "taeil-linux" not found
 9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.299052       14470 kubelet.go:2248] node "taeil-linux" not found
 9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.399343       14470 kubelet.go:2248] node "taeil-linux" not found
 9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.499561       14470 kubelet.go:2248] node "taeil-linux" not found
 9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.599723       14470 kubelet.go:2248] node "taeil-linux" not found

root@taeil-linux:~# systemctl status kube-apiserver

Unit kube-apiserver.service could not be found.

If I try docker logs

Flag --insecure-port has been deprecated, This flag will be removed in     a future version.
I0906 10:54:19.636649       1 server.go:560] external host was not     specified, using 10.0.0.152
I0906 10:54:19.636954       1 server.go:147] Version: v1.15.3
I0906 10:54:21.753962       1 plugins.go:158] Loaded 10 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,MutatingAdmissionWebhook.
I0906 10:54:21.753988       1 plugins.go:161] Loaded 6 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,ResourceQuota.
E0906 10:54:21.754660       1 prometheus.go:55] failed to register     depth metric admission_quota_controller: duplicate metrics collector     registration attempted
E0906 10:54:21.754701       1 prometheus.go:68] failed to register     adds metric admission_quota_controller: duplicate metrics collector     registration attempted
E0906 10:54:21.754787       1 prometheus.go:82] failed to register     latency metric admission_quota_controller: duplicate metrics collector     registration attempted
E0906 10:54:21.754842       1 prometheus.go:96] failed to register workDuration metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.754883       1 prometheus.go:112] failed to register     unfinished metric admission_quota_controller: duplicate metrics collector     registration attempted
E0906 10:54:21.754918       1 prometheus.go:126] failed to register     unfinished metric admission_quota_controller: duplicate metrics collector     registration attempted
E0906 10:54:21.754952       1 prometheus.go:152] failed to register     depth metric admission_quota_controller: duplicate metrics collector     registration attempted
E0906 10:54:21.754986       1 prometheus.go:164] failed to register     adds metric admission_quota_controller: duplicate metrics collector     registration attempted
E0906 10:54:21.755047       1 prometheus.go:176] failed to register     latency metric admission_quota_controller: duplicate metrics collector     registration attempted
E0906 10:54:21.755104       1 prometheus.go:188] failed to register     work_duration metric admission_quota_controller: duplicate metrics     collector registration attempted
E0906 10:54:21.755152       1 prometheus.go:203] failed to register     unfinished_work_seconds metric admission_quota_controller: duplicate     metrics collector registration attempted
E0906 10:54:21.755188       1 prometheus.go:216] failed to register     longest_running_processor_microseconds metric admission_quota_controller:     duplicate metrics collector registration attempted
I0906 10:54:21.755215       1 plugins.go:158] Loaded 10 mutating     admission controller(s) successfully in the following order:     NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesBy    Condition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObj    ectInUseProtection,MutatingAdmissionWebhook.
I0906 10:54:21.755226       1 plugins.go:161] Loaded 6 validating     admission controller(s) successfully in the following order:     LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,Validating    AdmissionWebhook,ResourceQuota.
I0906 10:54:21.757263       1 client.go:354] parsed scheme: ""
I0906 10:54:21.757280       1 client.go:354] scheme "" not registered,     fallback to default scheme
I0906 10:54:21.757335       1 asm_amd64.s:1337] ccResolverWrapper:     sending new addresses to cc: [{127.0.0.1:2379 0  <nil>}]
I0906 10:54:21.757402       1 asm_amd64.s:1337] balancerWrapper: got     update addr from Notify: [{127.0.0.1:2379 <nil>}]
W0906 10:54:21.757666       1 clientconn.go:1251] grpc:     addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}.     Err :connection error: desc = "transport: Error while dialing dial tcp     127.0.0.1:2379: connect: connection refused". Reconnecting...
I0906 10:54:22.753069       1 client.go:354] parsed scheme: ""
I0906 10:54:22.753118       1 client.go:354] scheme "" not registered,     fallback to default scheme
I0906 10:54:22.753204       1 asm_amd64.s:1337] ccResolverWrapper: sending new addresses to cc: [{127.0.0.1:2379 0  <nil>}]
I0906 10:54:22.753354       1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{127.0.0.1:2379 <nil>}]
W0906 10:54:22.753855       1 clientconn.go:1251] grpc:     addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}.     Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:22.757983       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:23.754019       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:24.430000       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:25.279869       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:26.931974       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:28.198719       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:30.825660       1 clientconn.go:1251] grpc:     addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}.     Err :connection error: desc = "transport: Error while dialing dial tcp     127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:32.850511       1 clientconn.go:1251] grpc:     addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}.     Err :connection error: desc = "transport: Error while dialing dial tcp     127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:36.294749       1 clientconn.go:1251] grpc:     addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}.     Err :connection error: desc = "transport: Error while dialing dial tcp     127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:38.737408       1 clientconn.go:1251] grpc:     addrConn.createTransport failed to connect to {127.0.0.1:2379 0  <nil>}.     Err :connection error: desc = "transport: Error while dialing dial tcp     127.0.0.1:2379: connect: connection refused". Reconnecting...
F0906 10:54:41.757603       1 storage_decorator.go:57] Unable to     create storage backend: config (&{ /registry {[https://127.0.0.1:2379]     /etc/kubernetes/pki/apiserver-etcd-client.key     /etc/kubernetes/pki/apiserver-etcd-client.crt     /etc/kubernetes/pki/etcd/ca.crt} true 0xc00063dd40     apiextensions.k8s.io/v1beta1 <nil> 5m0s 1m0s}), err (dial tcp     127.0.0.1:2379: connect: connection refused)
윤태일

Long time no see.

I totally realized how to solve this problem!

If you get an error like this for no reason, you can fix it by:

docker rm $(docker ps -a -q)

Perhaps an error occurred when the existing Kubernetes container was rebooted and the newly running container crashed.

watch docker ps

If you check the container with watch, you can see that kube-apiserver and others are turned off within 1 minute.

So I decided to delete all containers appearing in docker ps -a and it's fixed!

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集
0

コメントを追加

0

関連記事

分類Dev

NAS does not reconnect to network after router reboot

分類Dev

Kubernetes cluster autoscaler does not seem to work on GKE?

分類Dev

Continuously run an IOS app in the background even after sleep or reboot

分類Dev

Kubernetes: Nodes/Pods not showing with kubectl after building cluster with kubeadm

分類Dev

Drive not appearing after reboot

分類Dev

Not able to run prisma deploy: Error: Cluster undefined does not exist

分類Dev

Why does /var/lock/tt-rss keep being deleted after reboot?

分類Dev

Quota exceeded after reboot (ubuntu)

分類Dev

Nginx requires reload after reboot

分類Dev

Device file vanishes after a reboot

分類Dev

Screen resolution is not saved after reboot

分類Dev

Maintain terminal session after reboot?

分類Dev

SSH fails in Crontab after reboot

分類Dev

Reset Kubernetes cluster

分類Dev

Kubernetes Cluster Context with Multiple Namespaces

分類Dev

How to create a user in a Kubernetes cluster?

分類Dev

Start Docker Containers In Specific Order After Reboot

分類Dev

Hashicorp Vault stops being initialised after reboot

分類Dev

post-up only once after reboot

分類Dev

Mate. Purple icons after reboot

分類Dev

Does garbage collection run immediately after GC.Collect()?

分類Dev

Eclipse IDE does not run Java application after modification of module name

分類Dev

startx autologin does not run in /etc/inittab in after Debian Jessie update

分類Dev

Black screen after boot repair and grub lost after third reboot

分類Dev

Linux run every program on cluster

分類Dev

HPA + Cluster Autoscaler + OPA within Federated Kubernetes cluster on GKE

分類Dev

Is there a way to run a specific script with every “halt” and “reboot” command on Linux?

分類Dev

kubernetes UnexpectedAdmissionError after rollout

分類Dev

How to use kubeadm to create kubernetes cluster?

Related 関連記事

  1. 1

    NAS does not reconnect to network after router reboot

  2. 2

    Kubernetes cluster autoscaler does not seem to work on GKE?

  3. 3

    Continuously run an IOS app in the background even after sleep or reboot

  4. 4

    Kubernetes: Nodes/Pods not showing with kubectl after building cluster with kubeadm

  5. 5

    Drive not appearing after reboot

  6. 6

    Not able to run prisma deploy: Error: Cluster undefined does not exist

  7. 7

    Why does /var/lock/tt-rss keep being deleted after reboot?

  8. 8

    Quota exceeded after reboot (ubuntu)

  9. 9

    Nginx requires reload after reboot

  10. 10

    Device file vanishes after a reboot

  11. 11

    Screen resolution is not saved after reboot

  12. 12

    Maintain terminal session after reboot?

  13. 13

    SSH fails in Crontab after reboot

  14. 14

    Reset Kubernetes cluster

  15. 15

    Kubernetes Cluster Context with Multiple Namespaces

  16. 16

    How to create a user in a Kubernetes cluster?

  17. 17

    Start Docker Containers In Specific Order After Reboot

  18. 18

    Hashicorp Vault stops being initialised after reboot

  19. 19

    post-up only once after reboot

  20. 20

    Mate. Purple icons after reboot

  21. 21

    Does garbage collection run immediately after GC.Collect()?

  22. 22

    Eclipse IDE does not run Java application after modification of module name

  23. 23

    startx autologin does not run in /etc/inittab in after Debian Jessie update

  24. 24

    Black screen after boot repair and grub lost after third reboot

  25. 25

    Linux run every program on cluster

  26. 26

    HPA + Cluster Autoscaler + OPA within Federated Kubernetes cluster on GKE

  27. 27

    Is there a way to run a specific script with every “halt” and “reboot” command on Linux?

  28. 28

    kubernetes UnexpectedAdmissionError after rollout

  29. 29

    How to use kubeadm to create kubernetes cluster?

ホットタグ

アーカイブ