説明
私はkubernetesに比較的慣れていません。デフォルトのソケット(/var/run/dockershim.sock)を使用するとクラスターを実行できますが、crioソケットを使用してプライベートリポジトリから画像をプルしようとすると、速度が比較に近くないことに気付きました。
crio.socketを使用するようにすべてのノードを構成しようとしていますが、このソケットでマスターノードを起動できません。
kubernetesのドキュメントに従いました。kubeadmを使用してクラスター内の各kubeletを構成することと、gitドキュメントのcri-oも使用しました。
残念ながら、プライベートリポジトリフラグを無視しているように見えるため、機能させることができません。
問題を再現する手順:
kubeadm init \
--upload-certs \
--cri-socket=/var/run/crio/crio.sock \
--node-name=my_node_name \
--image-repository=my.private.repo \
--pod-network-cidr=10.96.0.0/16 \
--kubernetes-version=v1.18.2 \
--control-plane-endpoint=ip:6443 \
--apiserver-cert-extra-sans=ip \
--apiserver-advertise-address=ip
journalctl -xeu crio -f
受け取った結果を説明してください。
デバッグモードのcrioからのログのサンプル:
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043499089+02:00" level=debug msg="Trying to access \"k8s.gcr.io/pause:3.2\"" file="docker/docker_image_src.go:68"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043547722+02:00" level=debug msg="Credentials not found" file="config/config.go:123"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043576124+02:00" level=debug msg="Using registries.d directory /etc/containers/registries.d for sigstore configuration" file="docker/lookaside.go:51"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043706369+02:00" level=debug msg=" Using \"default-docker\" configuration" file="docker/lookaside.go:169"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043736378+02:00" level=debug msg=" No signature storage configuration found for k8s.gcr.io/pause:3.2" file="docker/lookaside.go:174"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043769424+02:00" level=debug msg="Looking for TLS certificates and private keys in /etc/docker/certs.d/k8s.gcr.io" file="tlsclientconfig/tlsclientconfig.go:21"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043858410+02:00" level=debug msg="GET https://k8s.gcr.io/v2/" file="docker/docker_client.go:516"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.046154250+02:00" level=debug msg="Ping https://k8s.gcr.io/v2/ err Get \"https://k8s.gcr.io/v2/\": dial tcp 10.254.3.15:443: connect: connection refused (&url.Error{Op:\"Get\", URL:\"https://k8s.gcr.io/v2/\", Err:(*net.OpError)(0xc00084d5e0)})" file="docker/docker_client.go:708"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.046239456+02:00" level=debug msg="GET https://k8s.gcr.io/v1/_ping" file="docker/docker_client.go:516"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.048653448+02:00" level=debug msg="Ping https://k8s.gcr.io/v1/_ping err Get \"https://k8s.gcr.io/v1/_ping\": dial tcp 10.254.3.15:443: connect: connection refused (&url.Error{Op:\"Get\", URL:\"https://k8s.gcr.io/v1/_ping\", Err:(*net.OpError)(0xc0006b0690)})" file="docker/docker_client.go:735"
期待した結果を説明してください。
クリオソケットを使用してノードを起動する
重要と思われる追加情報(たとえば、問題が発生するのはたまにしかありません):
デフォルトのソケットを使用してノードを起動した場合、例:
# kubeadm init \
--upload-certs \
--cri-socket=/var/run/dockershim.sock \
--node-name=my_node_name \
--image-repository=my.private.repo \
--pod-network-cidr=10.96.0.0/16 \
--kubernetes-version=v1.18.2 \
--control-plane-endpoint=ip:6443 \
--apiserver-cert-extra-sans=ip \
--apiserver-advertise-address=ip
W0630 20:24:33.223266 29033 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.18.2
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Using existing ca certificate authority
[certs] Using existing apiserver certificate and key on disk
[certs] Using existing apiserver-kubelet-client certificate and key on disk
[certs] Using existing front-proxy-ca certificate authority
[certs] Using existing front-proxy-client certificate and key on disk
[certs] Using existing etcd/ca certificate authority
[certs] Using existing etcd/server certificate and key on disk
[certs] Using existing etcd/peer certificate and key on disk
[certs] Using existing etcd/healthcheck-client certificate and key on disk
[certs] Using existing apiserver-etcd-client certificate and key on disk
[certs] Using the existing "sa" key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/admin.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/scheduler.conf"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
W0630 20:24:35.839949 29033 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[control-plane] Creating static Pod manifest for "kube-scheduler"
W0630 20:24:35.841420 29033 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 11.003647 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.18" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
key
[mark-control-plane] Marking the node hostname as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node hostname as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: token
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of the control-plane node running the following command on each as root:
kubeadm join ip:6443 --token token \
--discovery-token-ca-cert-hash sha256:hash \
--control-plane --certificate-key key
Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join ip:6443 --token token \
--discovery-token-ca-cert-hash sha256:hash
クリオソケットでノードを起動した場合:
# kubeadm init \
--upload-certs \
--cri-socket=/var/run/crio/crio.sock \
--node-name=my_node_name \
--image-repository=my.private.repo \
--pod-network-cidr=10.96.0.0/16 \
--kubernetes-version=v1.18.2 \
--control-plane-endpoint=ip:6443 \
--apiserver-cert-extra-sans=ip \
--apiserver-advertise-address=ip
W0630 20:32:33.827957 2916 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.18.2
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [hostname kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.96.134.57 10.96.134.57 10.96.134.57]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [hostname localhost] and IPs [10.96.134.57 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [hostname localhost] and IPs [10.96.134.57 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
W0630 20:32:37.829806 2916 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[control-plane] Creating static Pod manifest for "kube-scheduler"
W0630 20:32:37.830826 2916 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:
- 'crictl --runtime-endpoint /var/run/crio/crio.sock ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'crictl --runtime-endpoint /var/run/crio/crio.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
ローカルホストがポート10248でリッスンしていることがわかります。
# curl -sSL http://localhost:10248/healthz
ok
クリオソケットのサンプル(ドキュメントに記載されているとおり):
# curl -v --unix-socket /var/run/crio/crio.sock http://localhost/info | jq
* About to connect() to localhost port 80 (#0)
* Trying /var/run/crio/crio.sock...
* Failed to set TCP_KEEPIDLE on fd 3
* Failed to set TCP_KEEPINTVL on fd 3
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Connected to localhost (/var/run/crio/crio.sock) port 80 (#0)
> GET /info HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Tue, 30 Jun 2020 18:36:35 GMT
< Content-Length: 240
<
{ [data not shown]
100 240 100 240 0 0 144k 0 --:--:-- --:--:-- --:--:-- 234k
* Connection #0 to host localhost left intact
{
"storage_driver": "overlay2",
"storage_root": "/var/lib/containers/storage",
"cgroup_driver": "systemd",
"default_id_mappings": {
"uids": [
{
"container_id": 0,
"host_id": 0,
"size": 4294967295
}
],
"gids": [
{
"container_id": 0,
"host_id": 0,
"size": 4294967295
}
]
}
}
の出力 kubelet status
# systemctl status kubelet -l
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Tue 2020-06-30 20:39:49 CEST; 6s ago
Docs: https://kubernetes.io/docs/
Main PID: 8502 (kubelet)
Tasks: 15
Memory: 20.1M
CGroup: /system.slice/kubelet.service
└─8502 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --hostname-override=hostname
Jun 30 20:39:55 hostname kubelet[8502]: I0630 20:39:55.369441 8502 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Jun 30 20:39:55 hostname kubelet[8502]: I0630 20:39:55.399015 8502 kubelet_node_status.go:70] Attempting to register node hostname
Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.403707 8502 kubelet.go:2267] node "hostname" not found
Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.503871 8502 kubelet.go:2267] node "hostname" not found
Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.604115 8502 kubelet.go:2267] node "hostname" not found
Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.704324 8502 kubelet.go:2267] node "hostname" not found
Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.769448 8502 kubelet_node_status.go:92] Unable to register node "hostname" with API server: Post https://ip:6443/api/v1/nodes: dial tcp ip:6443: connect: connection refused
Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.805779 8502 kubelet.go:2267] node "hostname" not found
Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.906014 8502 kubelet.go:2267] node "hostname" not found
Jun 30 20:39:56 hostname kubelet[8502]: E0630 20:39:56.007272 8502 kubelet.go:2267] node "hostname" not found
私はまだネットワークコンテナを起動していないので、ネットワークエラーは関係がないことを私が知っている少しから、エラーはこの時点で予想されます。
の出力crio --version
:
# crio --version
crio version 1.18.2
Version: 1.18.2
GitCommit: 7f261aeebffed079b4475dde8b9d602b01973d33
GitTreeState: clean
BuildDate: 2020-06-18T21:05:27Z
GoVersion: go1.14
Compiler: gc
Platform: linux/amd64
Linkmode: static
の出力kubelet --version
:
# kubelet --version
Kubernetes v1.18.2
の出力LinuxOS version
:
# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.8 (Maipo)
追加の環境の詳細(AWS、VirtualBox、物理など):
インストールはベアボーンノードに適用されます。
kubeletファイルサンプル
# cat /etc/default/kubelet
KUBELET_EXTRA_ARGS=--feature-gates="AllAlpha=false,RunAsGroup=true" --container-runtime=remote --cgroup-driver=systemd --container-runtime-endpoint='unix:///var/run/crio/crio.sock' --runtime-request-timeout=5m
更新: github Kubernetes v1.18.2でチケットを発行しましたが、クリオバージョン1.18.2がRH7#3915のkubeletと同期できませんでした。cri-oがリモートリポジトリを処理できず、デフォルトのリポジトリk8s.ioをプルしようとしているため、バグがあるようです。詳細がわかり次第、チケットを更新します。
したがって、この問題はCRI-O
当初考えていたように(CRI-O
開発チームも)正確にはバグではありませんが、ユーザーCRI-O
がCRI
forとして使用したい場合kubernetes
や、プライベートリポジトリを使用したい場合は、多くの構成を適用する必要があるようです。。
したがって、CRI-Oの構成についてはここでは説明しません。これは、チームKubernetesv1.18.2で作成したチケットにすでに記載されているためです。クリオバージョン1.18.2はRH7#3915のkubeletと同期できません。
誰かが適用する必要がある最初の構成は、イメージがプルされるコンテナーのレジストリーを構成することです。
$ cat /etc/containers/registries.conf
[[registry]]
prefix = "k8s.gcr.io"
insecure = false
blocked = false
location = "k8s.gcr.io"
[[registry.mirror]]
location = "my.private.repo"
CRI-Oは、この構成をフラグとしてkubelet(haircommander / cri-o-kubeadm)に渡すことを推奨していますが、私にとっては、この構成だけでは機能しませんでした。
kubernetesのマニュアルに戻りましたが、kubeletのフラグを渡すのではなく、/var/lib/kubelet/config.yaml
実行時にファイルに渡すことをお勧めします。私の場合、ノードは他のソケットではなくCRI-Oソケットで開始する必要があるため、これは不可能です(コントロールプレーンノードでkubeletが使用するcgroupドライバーの構成を参照)。
それで、以下の設定ファイルのサンプルにこのフラグを渡すことで、なんとか起動して実行することができました。
$ cat /tmp/config.yaml
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 1.2.3.4
bindPort: 6443
nodeRegistration:
criSocket: unix:///var/run/crio/crio.sock
name: node.name
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
controlPlaneEndpoint: 1.2.3.4:6443
imageRepository: my.private.repo
kind: ClusterConfiguration
kubernetesVersion: v1.18.2
networking:
dnsDomain: cluster.local
podSubnet: 10.85.0.0/16
serviceSubnet: 10.96.0.0/12
scheduler: {}
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
次に、ユーザーはフラグ--config <file.yml>
を使用してマスター/ワーカーノードを起動するだけで、ノードが正常に起動されます。
ここにあるすべての情報が他の誰かに役立つことを願っています。
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加