一、etcd 故障排查之 database file (xxx) does not match with snapshot

遇到此类故障优先查看详细日志,journalctl -u kube-etcd -f

database file (/prophet/k8s/lib/etcd/member/snap/db index 1014209502) does not match with snaps……
Failed to start Etcd Server
……
skipped unexpected non snapshot file 000000003c200.snap.db

**定性:**上面日志基本可以确认为 etcd 部分节点数据损坏, index 不一致,可以通过命令验证

# raw shell
ETCDCTL_API=3 etcdctl --cacert /etc/kubernetes/cert/ca.pem --cert /etc/etcd/cert/etcd.pem --key /etc/etcd/cert/etcd-key.pem --endpoints https://172.27.128.120:2379,https://172.27.128.105:2379,https://172.27.128.119:2379 endpoint status -w table
# 部署脚本里做里封装,cd到安装包内
[root@xingye01 prophet-ee-3.8.1]# bin/etcd_v3.sh endpoint status  -w table
2020-06-17 22:51:47.425692 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
+-----------------------------+------------------+---------+---------+-----------+-----------+------------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+-----------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://172.27.133.133:2379 | 7bd2bf981000a381 | 3.1.5   | 28 MB   | false     |      4873 |   1014209404 |
| https://172.27.133.134:2379 | 3fa7c2c61eeb2e6c | 3.1.5   | 28 MB   | false     |      4873 |   1014209403 |
+-----------------------------+------------------+---------+---------+-----------+-----------+------------+
[root@xingye01 prophet-ee-3.8.1]#

查看 3 台节点的 RAFT INDEX 如果相等或者相差为 1 左右(内部算法可能导致不等)则可以认为数据一致,同时也可观察 DB SIZE
日志中显示的 db 路径是 etcd 容器中的路径,我们要找到对应在宿主机上的路径(其实日志里已经暴露出来了)
systemctl cat kube-etcd 找到  WorkingDirectory

# /usr/lib/systemd/system/kube-etcd.service
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/coreos

[Service]
Type=notify
WorkingDirectory=/mnt/disk01/data/k8s/lib/etcd/
EnvironmentFile=-/mnt/disk01/data/k8s/etc/kubernetes/etcd
ExecStart=/mnt/disk01/data/k8s/bin/etcd $CERT_ARGS $CLUSTET_ARGS $ARGS
Restart=always
RestartSec=5
StartLimitInterval=0
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
[root@ts-kvm10-sage361 ~]# ll /mnt/disk01/data/k8s/lib/etcd/
[root@ts-kvm10-sage361 ~]# ll /mnt/disk01/data/k8s/lib/etcd/
总用量 4
drwx------ 4 sage sage 4096 6月   4 16:05 member
[root@ts-kvm10-sage361 ~]#

先补充一个知识点,在该目录下,有 etcd 对应的数据,目录结构一般长成这个样子

etcd
└── member
    ├── snap
    │   ├── 0000000000000006-00000000009e22ab.snap
    │   ├── 0000000000000006-00000000009e49bc.snap
    │   ├── 0000000000000006-00000000009e70cd.snap
    │   ├── 0000000000000006-00000000009e97de.snap
    │   ├── 0000000000000006-00000000009ebeef.snap
    │   └── db
    └── wal
        ├── 000000000000006e-000000000095d7cf.wal
        ├── 000000000000006f-000000000099a568.wal
        ├── 0000000000000070-00000000009b0142.wal
        ├── 0000000000000071-00000000009c5cdb.wal
        ├── 0000000000000072-00000000009db309.wal
        └── 0.tmp

出问题的数据就在这个 etcd/member/snap/db 上
因为 etcd 是由三台集群搭建,数据是会相互同步的,我们先把这台节点上的数据目录移除(稳妥点只要改个名就好),之后它会从其它两个节点拉取数据(注意此时可能会比较消耗网络),
通过查看 WorkingDirectory 我们得知 etcd 的数据目录存储在 mnt/disk01/data/k8s/lib/etcd/

cd mnt/disk01/data/k8s/lib/
mv etcd etcd-old
mkdir etcd

等待 etcd 重启(kubelet 会将其重新拉起),或手动重启,稍等 etcd 同步数据完成,再次查看 etcd 的运行日志

此时如果发现 kube-etcd 仍然无法重启,查看日志member 36c805454ffdfd has already been bootstrapped

the server is already initialized as member before,starting as etcd member……
skipped unexpected non snapshot file 00000003cp7c200.snap.db
member 36c805454ffdfd has already been bootstrapped

查看资料说是:
One of the member was bootstrapped via discovery service. You must remove the previous data-dir to clean up the member information. Or the member will ignore the new configuration and start with the old configuration. That is why you see the mismatch.
大概意思:
其中一个成员是通过 discovery service 引导的。必须删除以前的数据目录来清理成员信息。否则成员将忽略新配置,使用旧配置。这就是为什么你看到了不匹配。
看到了这里,问题所在也就很明确了,启动失败的原因在于 data-dir (/var/lib/etcd/default.etcd)中记录的信息与 etcd 启动的选项所标识的信息不太匹配造成的。
问题解决:

#	之前重启kube-etcd触发了数据同步,先将数据清空
rm -rf /mnt/disk01/data/k8s/lib/etcd/*

systemctl cat kube-etcd得到** **EnvironmentFile=-/4pd/etcd/etc/kubernetes/etcd

# cluster listen
CLUSTET_ARGS="--initial-advertise-peer-urls https://172.27.133.133:2380 \
    --listen-peer-urls https://172.27.133.133:2380 \
    --listen-client-urls https://172.27.133.133:2379 \
    --advertise-client-urls https://172.27.133.133:2379  \
    --initial-cluster-token etcd-cluster-0  \
    --initial-cluster 172.27.133.133=https://172.27.133.133:2380,172.27.133.134=https://172.27.133.134:2380,172.27.233.12=https://172.27.233.12:2380 \
    --initial-cluster-state new \ # 将new这个参数修改成existing,启动正常!
    --auto-compaction-retention=1 \
    --quota-backend-bytes=6442450944 \
    --heartbeat-interval=250 \
    --election-timeout=2000"

修改  --initial-cluster-state 参数将 new 修改为 existing
之后重启 kube-etcd
验证

[root@xingye01 prophet-ee-3.8.1]# bin/etcd_v3.sh endpoint status  -w table
2020-06-17 23:06:57.063842 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
+-----------------------------+------------------+---------+---------+-----------+-----------+------------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+-----------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://172.27.133.133:2379 | 7bd2bf981000a381 | 3.1.5   | 28 MB   | false     |      4873 |   1014209404 |
| https://172.27.133.134:2379 | 3fa7c2c61eeb2e6c | 3.1.5   | 28 MB   | false     |      4873 |   1014209404 |
| https://172.27.233.12:2379  | e3cde0e1ce8f49a1 | 3.1.5   | 28 MB   | true      |      4873 |   1014209404 |
+-----------------------------+------------------+---------+---------+-----------+-----------+------------+
[root@xingye01 prophet-ee-3.8.1]# bin/etcd_v3.sh member list  -w table
2020-06-17 23:07:04.816356 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
+------------------+---------+----------------+-----------------------------+-----------------------------+
|        ID        | STATUS  |      NAME      |         PEER ADDRS          |        CLIENT ADDRS         |
+------------------+---------+----------------+-----------------------------+-----------------------------+
| 3fa7c2c61eeb2e6c | started | 172.27.133.134 | https://172.27.133.134:2380 | https://172.27.133.134:2379 |
| 7bd2bf981000a381 | started | 172.27.133.133 | https://172.27.133.133:2380 | https://172.27.133.133:2379 |
| e3cde0e1ce8f49a1 | started | 172.27.233.12  | https://172.27.233.12:2380  | https://172.27.233.12:2379  |
+------------------+---------+----------------+-----------------------------+-----------------------------+
[root@xingye01 prophet-ee-3.8.1]#

二、Error: etcdserver: mvcc: database space exceeded

etcd 运行一段时间后发现 DB size 超过 quota-backend-bytes 上限,触发警告 database space exceeded

Caused by: io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: etcdserver: mvcc: database space exceeded
    at io.grpc.Status.asRuntimeException(Status.java:530)
    at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:482)
    at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
    at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
    at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
    at io.etcd.jetcd.ClientConnectionManager$AuthTokenInterceptor$1$1.onClose(ClientConnectionManager.java:302)
    at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
    at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
    at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
    at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:694)
    at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
    at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
    at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
    at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397)
    at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
    at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584)
    at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
    at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)

异常原因
经查,这个异常的 message 是 etcd 服务端返回的,用来提示应用 etcd 服务端空间不足了。在 etcd 的官方文档常见问题(FAQ)版块针对这个场景有明确的说明,如:
Q、:“ mvcc:database space exceeded”是什么意思,我该如何解决?
A、:etcd 中的   多版本并发控制数据模型保留了密钥空间的确切历史记录。如果不定期压缩此历史记录(例如,通过设置–auto-compaction),etcd 最终将耗尽其存储空间。如果 etcd 的存储空间不足,则会发出空间配额警报,以保护群集免于进一步写入。只要发出警报,etcd 就会以 error 响应写请求 mvcc: database space exceeded。
要从空间不足配额警报中恢复:

  1. Compact etcd 的历史。
  2. 对每个 etcd 端点进行碎片整理。
  3. 解除警报。

附 FQA 地址:https://etcd.io/docs/v3.4.0/faq/ **解决问题**
根据 FQA 所述,可以通过如下命令 4 个步骤解决问题:

# 1、获取当前的版本
$ rev=$(ETCDCTL_API=3 etcdctl --endpoints=http://xxxxx:2379 endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9].*')
# 2、压缩当前版本之前的所有记录
$ ETCDCTL_API=3 etcdctl compact $rev
compacted revision 1516
# 3、清理多余的碎片空间
$ ETCDCTL_API=3 etcdctl defrag
Finished defragmenting etcd member[127.0.0.1:2379]
# 4、解除警告
$ ETCDCTL_API=3 etcdctl alarm disarm
memberID:13803658152347727308 alarm:NOSPACE

执行以上命令无误后,可以尝试写入数据,如果正常写入数据了,代表已成功释放空间了。最后一个解除警告的步骤不能漏,这就是个标记,和真正有无使用空间没有直接的逻辑关系。否则即使空间已释放了,也会提示空间不足。另除了手动压缩外,可以设置自动压缩,指令如下:

# 保留一个小时的历史记录
$ etcd --auto-compaction-retention=1

etcd 不同的版本自动压缩的行为有细微差别,详情见:https://etcd.io/#history-compaction
细节补充
碎片整理
压缩 key 空间后,会出现内部碎片,这些压缩出来的碎片空间可以被 etcd 使用,但是不会真正的释放物理空间,需要进行碎片整理,如

$ etcdctl defrag
Finished defragmenting etcd member[127.0.0.1:2379]

以上指令只作用于当前所在的主机,不会在集群环境中复刻。可以使用–cluster 标记指定所有成员以自动查找所有集群成员。如:

$ etcdctl defrag --cluster
Finished defragmenting etcd member[http://127.0.0.1:2379]
Finished defragmenting etcd member[http://127.0.0.1:22379]
Finished defragmenting etcd member[http://127.0.0.1:32379]

如何查看 etcd 中的数据
https://blog.csdn.net/weixin_33749131/article/details/89759487?utm_medium=distribute.pc_aggpage_search_result.none-task-blog-2allsobaiduend~default-4-89759487.nonecase&utm_term=etcd%E6%95%B0%E6%8D%AE%E5%A6%82%E4%BD%95%E6%9F%A5%E7%9C%8B&spm=1000.2123.3001.4430

参考:https://www.cnblogs.com/davygeek/p/8524477.html