折腾一天了,一共三台 master 节点机器 用 keepalived 做虚拟 ip ,开启了 lvsf ,测试关闭其中任意一台,另外两台都没问题,但是只要关闭 2 台,服务就不可用了.
[root@master-1 ~]# kubectl get nodes
The connection to the server 192.168.0.8:6443 was refused - did you specify the right host or port?
[root@master-1 ~]# netstat -ntlp |grep 6443
[root@master-1 ~]# docker ps -a |grep kube-api|grep -v pause
0c1c0042b8c2 53224b502ea4 "kube-apiserver --ad…" About a minute ago Exited (1) 54 seconds ago k8s_kube-apiserver_kube-apiserver-master-1.host.com_kube-system_464df844856c9d5461cb184edc4974c9_45
[root@master-1 ~]# docker logs -f 0c1c0042b8c2
I1120 14:25:26.120729 1 server.go:553] external host was not specified, using 192.168.0.11
I1120 14:25:26.122152 1 server.go:161] Version: v1.22.3
I1120 14:25:26.836619 1 shared_informer.go:240] Waiting for caches to sync for node_authorizer
I1120 14:25:26.838689 1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I1120 14:25:26.838721 1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
I1120 14:25:26.840979 1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I1120 14:25:26.841003 1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
Error: context deadline exceeded
[root@master-1 ~]# docker ps -a |grep etcd
dfd6026ae3fd 004811815584 "etcd --advertise-cl…" 3 minutes ago Up 3 minutes k8s_etcd_etcd-master-1.host.com_kube-system_a23c864b52d59788909994fe31a97f5e_8
13c6e65046d6 004811815584 "etcd --advertise-cl…" 7 minutes ago Exited (2) 3 minutes ago k8s_etcd_etcd-master-1.host.com_kube-system_a23c864b52d59788909994fe31a97f5e_7
5ca2f134f743 registry.aliyuncs.com/google_containers/pause:3.5 "/pause" 22 minutes ago Up 22 minutes k8s_POD_etcd-master-1.host.com_kube-system_a23c864b52d59788909994fe31a97f5e_1
[root@master-1 ~]# docker logs -n 10 13c6e65046d6
{"level":"warn","ts":"2021-11-20T14:24:39.911Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"ad7fc708963cf6f3","rtt":"0s","error":"dial tcp 192.168.0.9:2380: i/o timeout"}
{"level":"warn","ts":"2021-11-20T14:24:39.915Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"c68a49f4a0c3cea9","rtt":"0s","error":"dial tcp 192.168.0.10:2380: connect: no route to host"}
{"level":"warn","ts":"2021-11-20T14:24:39.915Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"c68a49f4a0c3cea9","rtt":"0s","error":"dial tcp 192.168.0.10:2380: connect: no route to host"}
{"level":"info","ts":"2021-11-20T14:24:40.658Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"cb18584c4f4dbfc is starting a new election at term 7"}
{"level":"info","ts":"2021-11-20T14:24:40.658Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"cb18584c4f4dbfc became pre-candidate at term 7"}
{"level":"info","ts":"2021-11-20T14:24:40.658Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"cb18584c4f4dbfc received MsgPreVoteResp from cb18584c4f4dbfc at term 7"}
{"level":"info","ts":"2021-11-20T14:24:40.658Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"cb18584c4f4dbfc [logterm: 7, index: 3988] sent MsgPreVote request to ad7fc708963cf6f3 at term 7"}
{"level":"info","ts":"2021-11-20T14:24:40.658Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"cb18584c4f4dbfc [logterm: 7, index: 3988] sent MsgPreVote request to c68a49f4a0c3cea9 at term 7"}
{"level":"warn","ts":"2021-11-20T14:24:41.729Z","caller":"etcdhttp/metrics.go:166","msg":"serving /health false; no leader"}
{"level":"warn","ts":"2021-11-20T14:24:41.729Z","caller":"etcdhttp/metrics.go:78","msg":"/health error","output":"{\"health\":\"false\",\"reason\":\"RAFT NO LEADER\"}","status-code":503}
etcd 没有选出 leader 节点?单个 etcd 不能用吗?求大佬支招
1
suifengdang666 2021-11-20 23:00:28 +08:00 1
etcd 为了避免脑裂,采用了 raft 算法,规定只有过半数节点在线才能提供服务,即 N/2+1 节点在线才能选出 Leader
|
2
cs419 2021-11-20 23:35:23 +08:00 1
高可用集群就是这么个设计方案
集群节点都活着的时候 轮询受理请求 分摊压力 挂掉的节点超过一半 就拒绝服务 原因很简单 高可用机制被破坏了 此时拒绝服务 在你修好节点后 集群可以正常工作 但如果提供继续提供服务 然后请求把剩下的节点打爆掉 则没法完整的修复数据 想要单节点可用 那就一开始用单节点启动 别创建集群 |
3
limao693 2021-11-20 23:49:04 +08:00 via iPhone 1
Raft 过半数量,可正常工作
|
4
chih758 2021-11-21 01:15:30 +08:00 via Android
测试环境 etcdctl member remove ,从集群里面删掉两个节点,就可以单点运行了
|
5
caicaiwoshishui OP @cs419 感谢大佬,想问下如果节点过半挂了,并且重启不能恢复,是否可以添加新的机器加入到集群中,但是问题是 kubectl 都不能用了,kubeadm 也连不上 master 节点呀,这怎么搞
|
6
caicaiwoshishui OP @chih758 刚测试了下,关闭 2 台机器,剩下一台,我 docker exec it 进入后台
配置 etcdctl 证书 sh-5.0# export ETCDCTL_API=3 sh-5.0# alias etcdctl='etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key' sh-5.0# etcdctl member list 执行 sh-5.0# `etcdctl member list` {"level":"warn","ts":"2021-11-21T02:11:18.722Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003a4700/#initially=[https://127.0.0.1:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"} Error: context deadline exceeded 超时,也就是剩下一台机器的 etcd 会超时并且 docker 会 exit 掉 |
7
caicaiwoshishui OP @suifengdang666 想问下生产环境中.kubeadm 创建的 k8s 集群,etcd 是独立出来的吗?还是用 kubeadm 自带的 etcd
|
8
suifengdang666 2021-11-21 21:56:12 +08:00
@caicaiwoshishui kubeadm 创建的就行,如果怕 master 负载太高导致 etcd 异常,可以独立几个 vm 组一个 etcd 集群
|
9
pmispig 2021-11-21 23:41:28 +08:00
etcd 和 kubei api 分开放到不同的服务器部署
|
10
0x208 2021-12-13 16:45:28 +08:00
楼主找工作吗 可以看看我招聘贴
|
11
caicaiwoshishui OP @0x208 可以远程吗 不在北京哦
|