大家好,接着昨天的公告,我的家庭实验室宕机了,这里简单更新一下情况。以下内容未经编辑,纯属个人叙述。请欣赏!
结果发现整个家庭实验室并没有瘫痪,两个 Kubernetes 节点竟然在停电中幸存了下来。
两个 Kubernetes 控制平面节点。
Kubernetes 要求控制平面节点数量必须为奇数,而我的工作负载太重,单个节点无法运行;Longhorn 也要求至少有三个节点在线。所以我不得不关闭它们。
我是怎么进去的?我用的是Anubis CI用的那台Mac mini。电网重置的时候它不知怎么就自动开机了,或者说它在停电期间幸存了下来。
xe@t-elos:~$ uptime 09:45:55 up 66 days, 9:51, 4 users, load average: 0.37, 0.22, 0.18我的天,这真是个有用的信息!xe@t-elos:~$ uptime 09:45:55 up 66 days, 9:51, 4 users, load average: 0.37, 0.22, 0.18
总之,常用的调试方法都不管用(比如 `kubectl get nodes` 超时等等),所以我用 `nmap` 命令扫描了整个家庭子网。通常情况下,这个子网里会密密麻麻地布满设备,难以辨认。但这次几乎什么都没有。最引人注目的是:
Nmap scan report for kos-mos (192.168.2.236) Host is up, received arp-response (0.00011s latency). Scanned at 2026-03-18 09:23:09 EDT for 1s Not shown: 996 closed tcp ports (reset) PORT STATE SERVICE REASON 3260/tcp open iscsi syn-ack ttl 64 9100/tcp open jetdirect syn-ack ttl 64 50000/tcp open ibm-db2 syn-ack ttl 64 50001/tcp open unknown syn-ack ttl 64 MAC Address: FC:34:97:0D:1E:CD (Asustek Computer) Nmap scan report for ontos (192.168.2.237) Host is up, received arp-response (0.00011s latency). Scanned at 2026-03-18 09:23:09 EDT for 1s Not shown: 996 closed tcp ports (reset) PORT STATE SERVICE REASON 3260/tcp open iscsi syn-ack ttl 64 9100/tcp open jetdirect syn-ack ttl 64 50000/tcp open ibm-db2 syn-ack ttl 64 50001/tcp open unknown syn-ack ttl 64 MAC Address: FC:34:97:0D:1F:AE (Asustek Computer)这两台机器是 Kubernetes 控制平面节点!我无法通过 SSH 连接到它们,因为它们运行的是 Talos Linux,但我可以使用Nmap scan report for kos-mos (192.168.2.236) Host is up, received arp-response (0.00011s latency). Scanned at 2026-03-18 09:23:09 EDT for 1s Not shown: 996 closed tcp ports (reset) PORT STATE SERVICE REASON 3260/tcp open iscsi syn-ack ttl 64 9100/tcp open jetdirect syn-ack ttl 64 50000/tcp open ibm-db2 syn-ack ttl 64 50001/tcp open unknown syn-ack ttl 64 MAC Address: FC:34:97:0D:1E:CD (Asustek Computer) Nmap scan report for ontos (192.168.2.237) Host is up, received arp-response (0.00011s latency). Scanned at 2026-03-18 09:23:09 EDT for 1s Not shown: 996 closed tcp ports (reset) PORT STATE SERVICE REASON 3260/tcp open iscsi syn-ack ttl 64 9100/tcp open jetdirect syn-ack ttl 64 50000/tcp open ibm-db2 syn-ack ttl 64 50001/tcp open unknown syn-ack ttl 64 MAC Address: FC:34:97:0D:1F:AE (Asustek Computer)
talosctl (通过端口 50000)将其关闭:
$ ./bin/talosctl -n 192.168.2.236 shutdown --force WARNING: 192.168.2.236: server version 1.9.1 is older than client version 1.12.5 watching nodes: [192.168.2.236] * 192.168.2.236: events check condition met $ ./bin/talosctl -n 192.168.2.237 shutdown --force WARNING: 192.168.2.237: server version 1.9.1 is older than client version 1.12.5 watching nodes: [192.168.2.237] * 192.168.2.237: events check condition met现在它离线了,要等我回家才能再用。$ ./bin/talosctl -n 192.168.2.236 shutdown --force WARNING: 192.168.2.236: server version 1.9.1 is older than client version 1.12.5 watching nodes: [192.168.2.236] * 192.168.2.236: events check condition met $ ./bin/talosctl -n 192.168.2.237 shutdown --force WARNING: 192.168.2.237: server version 1.9.1 is older than client version 1.12.5 watching nodes: [192.168.2.237] * 192.168.2.237: events check condition met
这导致赞助商面板离线,因为家庭实验室里的外部 DNS 服务器在线,和我新部署的云端服务器争夺 DNS 控制权。现在赞助商面板已经恢复在线(我一开始就应该把它放在云端,这怪我),银河系大部分地区也恢复了平静,至少在我这里看来是这样。
行动事项:
- 弄清楚 ontos 和 kos-mos 重新上线的原因
- 当墙壁电源恢复供电时,让家庭实验室中的所有节点恢复供电。
- 检查家用实验室的电源是否损坏
- 重新评估 Talos Linux 的使用情况,是否考虑切换到 Rocky?