如何详细学习[K8s] Kubernetes 安装部署指南(V2)的步骤和技巧?
摘要:0 序 熟悉了 K8s 的基础概念和运行原理后,即可尝试自行部署一下K8s集群。 K8s概述 - 博客园千千寰宇 本篇区别于: [虚拟化云原生] Kubernetes 安装部署指南 - 博客园千千寰宇 前篇: 基于 docker CR
0 序
熟悉了 K8s 的基础概念和运行原理后,即可尝试自行部署一下K8s集群。
K8s概述 - 博客园/千千寰宇
本篇区别于: [虚拟化/云原生] Kubernetes 安装部署指南 - 博客园/千千寰宇
前篇: 基于 docker CRI/运行时,k8s server 版本: 1.25.0 / "exec-opts": ["native.cgroupdriver=systemd"]
本篇: 基于 containerd CRI/运行时,k8s server 版本: 1.28.0
1 K8s 安装部署篇
环境配置与前置准备
CENTOS7 服务器 x N台 (N≥3)
每台服务器 2Core 2GB
更新YUM镜像源
yum update
yum upgrade
Step1 主节点+工作节点:安装、运行 Docker
yum -y install wget
wget https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker-ce.repo
yum -y install docker-ce
//查看版本
docker version
systemctl enable docker
systemctl start docker
Step2 主节点+工作节点:安装 kubeadm / kubectl / kubelet
安装阿里源
cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
#baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-\$basearch
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
#gpgcheck=0
repo_gpgcheck=1
#repo_gpgcheck=0
#gpgkey=https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
#exclude=kubelet kubeadm kubectl
EOF
//更新 YUM 源
yum update
禁用 SELinux
将 SELinux 设置为 pemissive 模式,相当于将其禁用
通过运行命令setenforce 0 和 sed ... 将 SELinux 设置为 permissive 模式可以有效地将其禁用。这是允许容器访问主机文件系统所必须的,而这些操作是为了保证Pod网络工作正常。
setenforce 0
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
关闭 swap
禁用交换分区,为了保证kubelet 正常工作,你必须禁用交互分区。
例如,sudo swappoff -a 将暂时禁用交换分区。要使此更改在重启后保持不变,请确保在如/etc/fstab、systemd.swap 等配置文件中禁用交换分区,具体取决于你的系统配置如何。
swapoff -a
安装、并启用 kubelet
sudo yum install -y kubelet kubeadm kubectl --disableexclude=kubernetes
//设置开机自动启动
sudo systemctl enable --now kubelet
Step3 部署主节点
查看 kubeadm 版本信息
# kubeadm config print init-defaults
...
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.k8s.io
kind: ClusterConfiguration
kubernetesVersion: 1.28.0
...
其中 apiVersion 和 kubernetesVersion 需要和下面编写的 kubeadm.yml 保持一致
kind = InitConfiguration: 用于定义一些初始化配置,如初始化使用的token以及apiserver地址等
kind = ClusterConfiguration:用于定义apiserver、etcd、network、scheduler、controller-manager等master组件相关配置项
kind = KubeletConfiguration:用于定义kubelet组件相关的配置项
kind = KubeProxyConfiguration:用于定义kube-proxy组件相关的配置项
编写 kubeadm.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: 1.28.0
imageRepository: registry.aliyuncs.com/google_containers
#imageRepository: k8s.gcr.io
controllerManager: {}
dns:
type: CoreDNS
apiServer:
extraArgs: # extraArgs 字段由 key: value 对组成
runtime-config: "api/all=true"
etcd:
local:
dataDir: /data/k8s/etcd
scheduler: {}
kubeadm ClusterConfiguration 对象公开了 extraArgs 字段,它可以覆盖传递给控制平面组件(如 APIServer、ControllerManager 和 Scheduler)的默认参数。
启用 kubelet.service
systemctl enable kubelet.service
启动容器运行时:containerd
rm -rf /etc/containerd/config.toml
//修改 containerd 配置,添加镜像加速:
// 1) 基于默认配置之上,编辑 containerd 配置
sudo mkdir -p /etc/containerd
sudo containerd config default | sudo tee /etc/containerd/config.toml
//修改 /etc/containerd/config.toml :
...
[plugins."io.containerd.grpc.v1.cri"]
# 修改: 1行配置
# sandbox_image = "registry.k8s.io/pause:3.6"
sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.9"
[plugins."io.containerd.grpc.v1.cri".registry]
...
# 新增:3+2行配置(不含注释行或空行)
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
# 阿里云镜像加速获取 from https://cr.console.aliyun.com/cn-hangzhou/instances/mirrors
endpoint = ["https://xxx.mirror.aliyuncs.com", "https://registry-1.docker.io"]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."registry.k8s.io"]
endpoint = ["https://registry.aliyuncs.com/google_containers"]
...
systemctl restart containerd
systemctl status containerd
使网桥支持 IPv6
cd /etc/sysctl.d/
vi k8s-sysctl.conf
//添加如下文本:
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
ls /etc/sysctl.d/k8s-sysctl.conf
//使其立即生效
sudo sysctl --system
时间同步 (所有机器, 可选)
# sudo yum install -y ntpdate
//执行时间同步命令
# sudo ntpdate time.windows.com
6 Apr 00:57:35 ntpdate[8944]: step time server 52.231.114.183 offset -0.921723 sec
主节点初始化: kubeadm / kubelet
主节点初始化
本操作会生成 /var/lib/kubelet/config.yaml 等重要文件。
//使其生效
# kubeadm init --config ~/k8s-deployments/kubeadm.yaml
[init] Using Kubernetes version: v1.28.0
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local vm-a] and IPs [10.96.0.1 192.168.xx.211]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost vm-a] and IPs [192.168.xx.211 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost vm-a] and IPs [192.168.xx.211 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 7.507281 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node vm-a as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node vm-a as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: qi82d7.glltv3hltpe4aq08
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user: //为了启动使用你的集群,您需要以【普通用户】身份运行以下内容:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run: //或者,如果您是 root 用户,您也可以运行
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster. //您现在应该向集群部署一个 Pod 网络。
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root: //然后,您可以通过在每个节点上以 root 身份运行以下命令来连接任意数量的工作者节点。
kubeadm join 192.168.xx.211:6443 --token qi82d7.glltv3hltpe4aq08 \
--discovery-token-ca-cert-hash sha256:6054b8402053e9eb8f6cb134c066f3e28ae80aa5fd28cec002af1f4199383890
解释说明:
Kubernetes 集群的 Service 网段(私网网段) = 10.96.0.0/12 是 kubeadm 默认的 Service 网段(可通过 --service-cidr 配置)
10.96.0.1 通常是 Kubernetes API Server 的 ClusterIP (用于集群内部 Pod 访问 API Server;对应 kubernetes 这个默认 Service),如果你在这个集群中(且集群在后续步骤中已启动时),可以执行:
如果 join 指令没有及时记住,还可以在 master 节点上生成加入命令: kubeadm token create --print-join-command
//查看 kubernetes service 的 ClusterIP
# kubectl get svc kubernetes -n default
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 10h
//查看所有 service 的网段分布
# kubectl get svc --all-namespaces -o wide
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 10h <none>
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 10h k8s-app=kube-dns
本步骤基于配置文件的方式,当然也可直接命令行的方式:
Demo(源自网络)
Step4 主节点:启动集群
为了启动集群,需要运行以下内容:
master节点执行,node节点不执行
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
或者 root 用户执行: export KUBECONFIG=/etc/kubernetes/admin.conf
查看集群节点
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
vm-a NotReady control-plane 9h v1.28.2
Step5 工作节点:加入集群
node节点执行,master节点不执行
安装: kubelet / kubectl / kubeadm(参见上面的专门章节)
kubeadm : 初始化集群的命令。
kubectl : 与集群通信的命令行工具。
kubelet : 在集群中的每个节点上用来启动 Pod 和 容器等。
加入集群
kubeadm join 192.168.xx.211:6443 --token qi82d7.glltv3hltpe4aq08 \
--discovery-token-ca-cert-hash sha256:6054b8402053e9eb8f6cb134c066f3e28ae80aa5fd28cec002af1f4199383890
Step6 主节点: 安装 CNI 网络组件: Calico
下载 calico 镜像 到 containerd
如下这种方式,主要为了解决阿里云等镜像仓库中可能没有 calico 镜像、及解决 docker.io 的网络问题
//step1 拉取镜像
docker pull calico/cni:v3.26.1
docker pull calico/node:v3.26.1
docker pull calico/kube-controllers:v3.26.1
如果无法直接访问 docker.io ,则可:
docker pull m.daocloud.io/docker.io/calico/cni:v3.26.1
docker pull m.daocloud.io/docker.io/calico/node:v3.26.1
docker pull m.daocloud.io/docker.io/calico/kube-controllers:v3.26.1
docker tag m.daocloud.io/docker.io/calico/cni:v3.26.1 docker.io/calico/cni:v3.26.1
docker tag m.daocloud.io/docker.io/calico/node:v3.26.1 docker.io/calico/node:v3.26.1
docker tag m.daocloud.io/docker.io/calico/kube-controllers:v3.26.1 docker.io/calico/kube-controllers:v3.26.1
//step2 导入 docker 镜像 到 containerd
ctr -n k8s.io images import <(docker save calico/cni:v3.26.1)
ctr -n k8s.io images import <(docker save calico/node:v3.26.1)
ctr -n k8s.io images import <(docker save calico/kube-controllers:v3.26.1)
[root@vm-a ~]# touch ~/k8s-deployments/calico.yaml
手动下载 calico.yaml 并copy到 `~/k8s-deployments/calico.yaml`
https://github.com/projectcalico/calico/blob/v3.26.1/manifests/calico.yaml 【推荐】
or https://docs.projectcalico.org/manifests/calico.yaml (不推荐,版本对不上,需要 v3.26.1)
or https://calico-v3-25.netlify.app/archive/v3.25/manifests/calico.yaml (不推荐,版本对不上,需要 v3.26.1)
[root@vm-a ~]# kubectl apply -f ~/k8s-deployments/calico.yaml
poddisruptionbudget.policy/calico-kube-controllers configured
serviceaccount/calico-kube-controllers unchanged
serviceaccount/calico-node unchanged
configmap/calico-config unchanged
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/caliconodestatuses.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ipreservations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/kubecontrollersconfigurations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org configured
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers unchanged
clusterrole.rbac.authorization.k8s.io/calico-node configured
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers unchanged
clusterrolebinding.rbac.authorization.k8s.io/calico-node unchanged
daemonset.apps/calico-node configured
deployment.apps/calico-kube-controllers configured
检查运行状态
// 确认 Calico 就绪后,检查 CoreDNS
[root@vm-a ~]# kubectl get pod -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-7ddc4f45bc-7nvqn 1/1 Running 0 9m52s 172.16.51.65 vm-a <none> <none>
calico-node-sm5hw 1/1 Running 0 9m52s fd00:6868:6868::52b vm-a <none> <none>
coredns-66f779496c-52htv 1/1 Running 0 2d1h 172.16.51.67 vm-a <none> <none>
coredns-66f779496c-tsgrb 1/1 Running 0 2d1h 172.16.51.66 vm-a <none> <none>
etcd-vm-a 1/1 Running 1 2d1h fd00:6868:6868::af9 vm-a <none> <none>
kube-apiserver-vm-a 1/1 Running 1 2d1h fd00:6868:6868::af9 vm-a <none> <none>
kube-controller-manager-vm-a 1/1 Running 3 (53m ago) 2d1h fd00:6868:6868::52b vm-a <none> <none>
kube-proxy-7crkv 1/1 Running 0 2d1h fd00:6868:6868::af9 vm-a <none> <none>
kube-scheduler-vm-a 1/1 Running 5 (53m ago) 2d1h fd00:6868:6868::52b vm-a <none> <none>
//如果 CoreDNS 仍卡住,重启它们
[root@vm-a ~]# kubectl rollout restart deployment coredns -n kube-system
[root@vm-a ~]# echo -e "\n=== 节点状态 ===" && \
[root@vm-a ~]# kubectl get nodes -o wide
[root@vm-a ~]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
vm-a Ready control-plane 2d1h v1.28.2 fd00:6868:6868::52b <none> CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 containerd://1.6.33
Step7 核验部署结果:主节点
查看 kubelet 运行状态与日志
sudo systemctl status kubelet --no-pager
sudo journalctl -xeu kubelet -n 50 --no-pager | tail -30
检查镜像是否成功拉取
# sudo crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock images | grep pause
registry.aliyuncs.com/google_containers/pause 3.9 e6f1816883972 322kB
查看容器的运行状态
# sudo crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
1f57bb4c9ab9d ea1030da44aa1 6 minutes ago Running kube-proxy 0 4d286784c3ddb kube-proxy-7crkv
280d71fdbe867 f6f496300a2ae 6 minutes ago Running kube-scheduler 1 9c57afb350343 kube-scheduler-vm-a
0b32789371828 4be79c38a4bab 6 minutes ago Running kube-controller-manager 1 14ab1f95b380a kube-controller-manager-vm-a
2e821a289e1ca 73deb9a3f7025 6 minutes ago Running etcd 1 b06ec8f19d648 etcd-vm-a
62288683995c3 bb5e0dde9054c 6 minutes ago Running kube-apiserver 1 e4b6ad50d3758 kube-apiserver-vm-a
补充看下: docker
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
说明: 【容器运行时】确实已切换到了 containerd
Z FAQ for K8s 安装部署
Q: 重置k8s节点?
# 1. 重置 kubeadm(会删除所有集群配置和容器)
sudo kubeadm reset -f
# 2. 删除残留的配置文件和目录
sudo rm -rf /etc/kubernetes/
sudo rm -rf /var/lib/kubelet/
sudo rm -rf /var/lib/etcd/
sudo rm -rf ~/.kube/
# 3. 清理 CNI 网络配置
sudo rm -rf /etc/cni/net.d/
sudo rm -rf /var/lib/cni/
# 4. 清理 iptables 规则(可选,但建议执行)
sudo iptables -F && sudo iptables -t nat -F && sudo iptables -t mangle -F && sudo iptables -X
# 5. 重启 kubelet(或整个系统)
sudo systemctl restart kubelet
# 或者重启系统以确保干净
sudo reboot
检查 kubelet 运行或异常情况:
sudo systemctl status kubelet
sudo journalctl -xeu kubelet -n 100 --no-pager | tail -50
Q: kubeadm init 失败,报等待控制平面组件启动时超时等错误?
原因分析
这类错误表示 kubeadm init 在等待控制平面组件启动时超时了。最常见的原因是 kubelet 没有正常运行 或 容器运行时配置问题。
1. 检查 kubelet 进程状态
sudo systemctl status kubelet
如果显示 inactive (dead) 或失败状态,尝试启动:
sudo systemctl start kubelet
sudo systemctl enable kubelet
2. 查看 kubelet 详细日志
sudo journalctl -xeu kubelet -n 200 --no-pager
重点关注以下错误:
failed to run Kubelet
node "xxx" not found
cannot find cgroup
container runtime is down
3. 检查容器运行时(containerd)
# 检查 containerd 是否运行
sudo systemctl status containerd
# 如果未运行,启动它
sudo systemctl start containerd
sudo systemctl enable containerd
# 验证 containerd 状态
sudo crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock version
4. 检查控制平面容器状态
# 列出所有 Kubernetes 相关容器
sudo crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause
# 如果有失败的容器,查看日志(替换 CONTAINERID)
sudo crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID
常见原因及解决方案
原因 A:kubelet 配置错误
检查 kubelet 配置文件:
cat /var/lib/kubelet/config.yaml
常见问题:
cgroupDriver 与 containerd 不匹配(应为 systemd)
节点名称解析问题
修复 cgroupDriver 不匹配:
# 检查 containerd 的 cgroup driver
sudo cat /etc/containerd/config.toml | grep SystemdCgroup
# 确保 kubelet 使用相同的 driver
# 编辑 /var/lib/kubelet/config.yaml,设置:
# cgroupDriver: systemd
# 重启服务
sudo systemctl restart containerd
sudo systemctl restart kubelet
原因 B:镜像拉取失败(国内环境非常常见)
control-plane镜像无法从 registry.k8s.io 拉取。
解决方案:使用国内镜像源
自动方式: 参见 配置 /etc/containerd/config.toml 的章节 (亲测)
手动方式(未亲测)
# 查看 kubeadm.yaml 中是否指定了 imageRepository
grep imageRepository ~/k8s-deployments/kubeadm.yaml
# 如果没有,修改 kubeadm.yaml 添加:
# imageRepository: registry.aliyuncs.com/google_containers
# 或者手动拉取镜像
sudo kubeadm config images pull --image-repository=registry.aliyuncs.com/google_containers
# 然后重新初始化
sudo kubeadm init --config ~/k8s-deployments/kubeadm.yaml
原因 C:之前的重置不彻底
如果之前执行过 kubeadm reset,但 etcd 或网络配置残留:
# 彻底清理
sudo kubeadm reset -f
sudo rm -rf /etc/kubernetes/ /var/lib/kubelet/ /var/lib/etcd/ /var/lib/cni/ /etc/cni/
sudo rm -rf ~/.kube/
# 清理网络接口
sudo ip link delete cni0 2>/dev/null || true
sudo ip link delete flannel.1 2>/dev/null || true
# 重启
sudo reboot
推荐排查流程
# 1. 先查看具体错误日志
sudo journalctl -xeu kubelet -n 100 --no-pager | tail -50
# 2. 根据错误类型处理,常见情况:
# 情况 1:如果看到 "connection refused" 到 containerd
sudo systemctl restart containerd
sudo systemctl restart kubelet
# 情况 2:如果看到 "ImagePullBackOff" 或镜像相关错误
sudo kubeadm config images pull --image-repository=registry.aliyuncs.com/google_containers
# 情况 3:如果看到 cgroup 相关错误
# 编辑 /var/lib/kubelet/config.yaml 确保 cgroupDriver: systemd
sudo systemctl restart kubelet
# 3. 重新初始化
sudo kubeadm init --config ~/k8s-deployments/kubeadm.yaml
这样可以看到具体的错误信息,而不是超时提示。
Q: Docker Engine 没有实现容器运行时在Kubernetes 中工作所需的 CRI,如何解决?
Docker Engine 没有实现容器运行时在Kubernetes 中工作所需的 CRI,为此——必须安装一个额外的服务cri-dockerd
cri-dockerd 是一个基于传统的内置 Docker 引擎支持的项目,它在 1.24 版本中从 kubelet 中移除。
Q: 卸载安装的 kubeadm?
sudo yum remove kubelet kubeadm kubectl
Q: 重新安装指定版本的 kubeadm ?
yum install kubelet-1.23.17 kubeadm-1.23.17 kubectl-1.23.17 kubenetes-cri
Q: 启用 kubelet ?
sudo systemctl enable --now kubelet
Q: 查看 kubeadm 版本信息?
# kubeadm version
Q: 修改 kubeadm.yaml 版本?
修改 kubeadm.yaml 版本为 1.23.0
Q: 部署主节点 kubeadm?
参见 上文。
Q: 集群节点显示"couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused"?
问题描述
# kubectl get nodes
E0406 09:59:50.133311 30401 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0406 09:59:50.133757 30401 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0406 09:59:50.160301 30401 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0406 09:59:50.161204 30401 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0406 09:59:50.201546 30401 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?
原因分析
根本原因: kubectl 需要 kubeconfig 文件来知道如何连接集群。错误信息显示它找不到配置,所以使用了默认的 localhost:8080。
解决方法
方法1
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
或者 root 用户执行: export KUBECONFIG=/etc/kubernetes/admin.conf
查看集群节点
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
vm-a NotReady control-plane 9h v1.28.2
查看 Pods 状态
# kubectl get pods -n kube-system
E0406 00:39:29.018802 8634 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0406 00:39:29.019794 8634 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0406 00:39:29.022029 8634 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0406 00:39:29.024362 8634 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0406 00:39:29.024987 8634 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?
Q: 节点处于NotReady状态,如何排查?
NotReady 状态表示节点尚未完全就绪,最常见的原因是 CNI(容器网络插件)未安装 或 kubelet 健康检查失败。
1. 查看详细状态和事件
//查看节点详细信息
# kubectl describe node vm-a
...
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Mon, 06 Apr 2026 10:12:33 +0800 Mon, 06 Apr 2026 00:10:03 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 06 Apr 2026 10:12:33 +0800 Mon, 06 Apr 2026 00:10:03 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 06 Apr 2026 10:12:33 +0800 Mon, 06 Apr 2026 00:10:03 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Mon, 06 Apr 2026 10:12:33 +0800 Mon, 06 Apr 2026 00:10:03 +0800 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
...
重点关注:
Conditions 部分(特别是 Ready、NetworkUnavailable)
Events 部分
2. 检查 kubelet 日志
# sudo journalctl -xeu kubelet -n 100 --no-pager | tail -50
...
Apr 06 10:17:13 vm-a kubelet[7901]: E0406 10:17:13.966639 7901 kubelet.go:2855] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
...
3. 检查系统 Pod 状态
# sudo kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-66f779496c-52htv 0/1 Pending 0 10h
coredns-66f779496c-tsgrb 0/1 Pending 0 10h
etcd-vm-a 1/1 Running 1 10h
kube-apiserver-vm-a 1/1 Running 1 10h
kube-controller-manager-vm-a 1/1 Running 1 10h
kube-proxy-7crkv 1/1 Running 0 10h
kube-scheduler-vm-a 1/1 Running 1 10h
最常见原因:缺少 CNI 插件
如果是新初始化的集群,必须安装 CNI 插件(如 Calico、Flannel、Weave 等),否则节点会一直处于 NotReady。
安装 Calico(推荐:生产环境)
//安装 Calico CNI
# kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/calico.yaml
poddisruptionbudget.policy/calico-kube-controllers created
serviceaccount/calico-kube-controllers created
serviceaccount/calico-node created
serviceaccount/calico-cni-plugin created
configmap/calico-config created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgpfilters.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/caliconodestatuses.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipreservations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/kubecontrollersconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org created
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrole.rbac.authorization.k8s.io/calico-cni-plugin created
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-cni-plugin created
daemonset.apps/calico-node created
deployment.apps/calico-kube-controllers created
或 若需调整 Calico 的网络配置(如 Pod 网段、网络模式),可先手动下载 custom-resources.yaml 文件编辑后再部署:
curl -O https://raw.githubusercontent.com/projectcalico/calico/v3.28.2/manifests/custom-resources.yaml
# 替换镜像为国内源 (可选,未亲测)
sed -i 's|docker.io/calico|docker.mirrors.ustc.edu.cn/calico|g' calico.yaml
打开文件后,重点关注以下配置:
Pod 网段(cidr):默认 192.168.0.0/16,需确保与 Kubernetes 集群的--cluster-cidr(kube-apiserver 参数)一致(可通过kubectl -n kube-system get pod kube-apiserver-xxx -o yaml | grep cluster-cidr查看);
网络模式(vxlanEnabled):默认true(使用 vxlan 模式,无需节点间 BGP 配置),若需高性能可改为false并启用 BGP(需额外配置 BGP peer)。
修改完成后,再执行kubectl apply -f custom-resources.yaml部署。
查看下载/部署状态
//等待他们全部running即可
# kubectl get pod -n kube-system -o wide
# kubectl get nodes
安装 Flannel(更简单:非生产环境)
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
其他可能原因
原因
检查命令
解决方案
CNI 未安装
kubectl get pods -n kube-system 无网络 Pod
安装 Calico/Flannel
kubelet 无法访问 API
journalctl -u kubelet 有连接错误
检查防火墙/证书
容器运行时故障
sudo crictl ps 失败
重启 containerd
磁盘压力/内存不足
kubectl describe node 有 DiskPressure
清理磁盘/扩容
kube-proxy 未启动
kubectl get pods -n kube-system 无 kube-proxy
检查 kube-proxy Pod
小结:推荐排查流程
# 1. 查看节点详细状态
kubectl describe node vm-a
# 2. 查看系统 Pod 状态(确认是否有网络相关 Pod)
kubectl get pods -n kube-system -o wide
# 3. 如果只看到 kube-apiserver/kube-controller-manager/kube-scheduler/etcd,
# 但没有 calico/flannel/coredns 等,说明缺少 CNI
# 4. 安装 CNI(以 Calico 为例)
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/calico.yaml
# 5. 等待 Pod 启动
kubectl get pods -n kube-system -w
# 6. 检查节点状态(通常 1-2 分钟后变为 Ready)
kubectl get nodes
Q: Calico CNI 网络插件?
Calico
Calico 是一款功能全面的 CNI,不仅支持 Pod 间通信(通过 BGP、vxlan 等模式),还原生集成 Network Policy 功能,可实现精细化的网络访问控制(如限制 Pod 间通信、屏蔽外部访问等),完全满足需求。此外,Calico 3.28.2 作为稳定版本,兼容性强(支持 Kubernetes 1.24+)、性能优异,是生产环境的首选。
Calico 作为一个开源的三层虚拟化网络方案,用于为云原生应用实现互联及策略控制,相较于Flannel来说,Calico的优势是对网络策略(Network policy),它允许用户动态定义ACL规则进出容器的数据报文,实现为Pod间的通信按需施加安全策略。不仅如此,Calico还可以整合进大多数具备编排能力的环境, 可以为虚机和容器提供多主机间通信的功能。
运行架构图
Flannel
Flannel 是一款轻量级 CNI,主要通过 vxlan、host-gw 等模式实现 Pod 间网络互通,优势在于部署简单、资源占用低。
但Flannel 默认不支持 Network Policy—— 即使通过额外插件扩展,也需复杂配置且功能不完善,无法满足 “实施网络策略” 的硬性要求,因此直接排除。
Q: 卸载旧的 CNI (flannel) ?
# 1. 删除旧CNI的Pod和命名空间(以Flannel为例)
kubectl delete ns kube-flannel
kubectl delete pod -n kube-system -l app=flannel
# 2. 清理节点上的旧CNI配置(所有节点执行)
rm -rf /etc/cni/net.d/* # 删除CNI配置文件
systemctl restart kubelet # 重启kubelet,使配置生效
Q:为何K8s集群Service内网IP(10.96.0.1)在宿主机所在的物理局域网(192.168网段)也能ping通?
经典的网络问题。10.96.0.1 能在 192.168 网段 ping 通,说明存在路由或网络桥接机制。
可能原因
1. 宿主机路由表配置
宿主机上添加了指向 10.96.0.0/12 网段的路由:
//在宿主机上查看路由表
# ip route | grep 10.96
(实际查询:无结果)
其他可能的输出:
10.96.0.0/12 via 192.168.x.x dev eth0 # 通过某个网关
10.96.0.0/12 dev cni0 proto kernel # 通过 CNI 网桥
2. Kubernetes 网络模式:使用主机网络或端口映射
场景
说明
hostNetwork: true
Pod 直接使用宿主机网络栈
hostPort
将容器端口映射到宿主机
NodePort Service
通过 <NodeIP>:<Port> 暴露服务
3. CNI 插件的网桥模式
常见的 Kubernetes CNI(如 Flannel、Calico、Weave)会:
在宿主机创建虚拟网桥(如 cni0、docker0、flannel.1)
将 Pod 网段和 Service 网段路由到宿主机
//查看宿主机网桥
# brctl show
bridge name bridge id STP enabled interfaces
docker0 8000.024221f3205a no
virbr0 8000.52540013f1cb yes virbr0-nic
# ip link show type bridge
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
link/ether 52:54:00:xx:xx:cb brd ff:ff:ff:ff:ff:ff
5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
link/ether 02:42:21:xx:xx:5a brd ff:ff:ff:ff:ff:ff
4. iptables / IPVS 转发规则
Kube-proxy 在宿主机上创建了 NAT 或转发规则:
//查看 iptables 中关于 kubernetes 服务的规则
# sudo iptables -t nat -L | grep 10.96.0.1
KUBE-SVC-NPX46M4PTMTKRN6Y tcp -- anywhere 10.96.0.1 /* default/kubernetes:https cluster IP */ tcp dpt:https
# sudo iptables -t filter -L | grep 10.96
REJECT udp -- anywhere 10.96.0.10 /* kube-system/kube-dns:dns has no endpoints */ udp dpt:domain reject-with icmp-port-unreachable
REJECT tcp -- anywhere 10.96.0.10 /* kube-system/kube-dns:dns-tcp has no endpoints */ tcp dpt:domain reject-with icmp-port-unreachable
REJECT tcp -- anywhere 10.96.0.10 /* kube-system/kube-dns:metrics has no endpoints */ tcp dpt:9153 reject-with icmp-port-unreachable
5. MetalLB 或类似负载均衡器
如果集群部署了 MetalLB,可能将 Service IP 直接暴露在局域网:
MetalLB 的 L2 模式会通过 ARP/NDP 将 10.96.0.1 宣告到局域网
使局域网其他设备认为 10.96.0.1 就在本地网络
如何排查确认?
在宿主机上执行:
//1. 查看路由路径
# ip route get 10.96.0.1
10.96.0.1 via 192.168.xx.1 dev ens33 src 192.168.xx.211
cache
//2. 查看数据包走哪个网卡
# traceroute 10.96.0.1
traceroute to 10.96.0.1 (10.96.0.1), 30 hops max, 60 byte packets
1 XiaoQiang (192.168.xx.1) 1.927 ms 2.570 ms 2.502 ms
2 192.168.1.1 (192.168.1.1) 3.097 ms 3.037 ms 2.963 ms
3 100.64.xx.1 (100.64.xx.1) 10.148 ms 10.071 ms 10.007 ms
4 183.222.xx.121 (183.222.xx.121) 6.912 ms 5.619 ms 183.222.xx.125 (183.222.xx.125) 5.784 ms
5 10.96.0.1 (10.96.0.1) 7.407 ms 7.334 ms 8.256 ms
//3. 查看 局域网 arp/MAC IP转换协议
# arp 10.96.0.1
10.96.0.1 (10.96.0.1) -- no entry
// 4. 查看接口地址
# ip addr | grep 10.96
(实际查询:无结果)
// 5. 查看 kube-proxy 模式
# kubectl get configmap kube-proxy -n kube-system -o yaml | grep mode
mode: ""
如果在局域网其他机器上:
# 查看 ARP 表,确认 10.96.0.1 的 MAC 地址
arp -a | grep 10.96.0.1
# 对比宿主机的 MAC 地址
arp -a | grep <宿主机IP>
最可能的情况总结
你的集群配置
现象解释
使用 Flannel/Calico + 默认配置
宿主机作为路由器,转发 10.96.0.0/12 流量
使用 MetalLB (L2模式)
10.96.0.1 被直接ARP宣告到局域网
自定义路由
路由器/宿主机手动配置了静态路由
kube-proxy IPVS 模式
宿主机上创建了虚拟 IP
安全提醒
如果 10.96.0.1 能在整个局域网访问,说明 Kubernetes 内部网络暴露到了外部。建议检查:
网络策略(NetworkPolicy)是否正确配置
是否需要限制 Service 的访问范围
防火墙规则是否过于宽松
最终结论
基于路由路径:
192.168.xx.1(你家路由器)
192.168.1.1(光猫-路由模式 / 上级路由)
100.64.x.1(运营商CGNAT 共享地址, 说明你家宽带是大内网(没有公网 IP))
183.222.x.x(运营商公网出口)
10.96.0.1 ( 当地运营商的 BRAS 设备(宽带接入服务器); 作用: 1 给你拨号认证(PPPoE); 2 分配上网 IP; 3 做限速、策略、计费; 4 是你宽带真正的 “上网网关”)
关键点:
10.96.0.1 是运营商内部设备(BRAS、SR、核心路由器、网关等)
运营商内部会路由私网地址,所以你虽然走了公网出口,依然能到达它
这个 IP 只在运营商内网可见,互联网上其他人访问不到
10.96.0.1 这个 IP 不是公网,也不是黑客 / 异常节点就是你本地运营商的核心上网设备。
ping 它、traceroute 它都很安全,常用于测试本地宽带质量。
如果想测真实外网延迟不要 ping 10.96.0.1,要 ping 公网 DNS,比如:114.114.114.114、223.5.5.5、8.8.8.8
Q: K8s Service 网段(默认: 10.96.0.0/12)与 Pod 网段有啥区别吗?
区别非常大,而且是 Kubernetes 最核心的两个网段,我用最简单、最直白的方式给你讲清楚:
Pod 网段:给每个容器用的真实 IP
Service 网段:给服务提供的“虚拟固定 IP”
1. Pod 网段(podSubnet)
给 Pod 本身 用的 IP。
特点:
每个 Pod 启动时都会分配一个真实 IP
是动态、临时的,Pod 重建 IP 就变
用于 Pod 之间直接通信
属于二层/三层真实路由 IP
集群内部可以直接 ping 通
例子:
10.244.0.0/16
Pod IP 可能是:10.244.1.5、10.244.2.8 等
2. Service 网段(serviceSubnet)
你现在遇到的 10.96.0.0/12 就是这个。
它是 虚拟 IP(ClusterIP),不是真实网卡 IP。
特点:
给 Service 用,固定不变
不对应任何真实容器/网卡
作用是 负载均衡 + 服务发现
由 kube-proxy 做 iptables/ipvs 转发
不能直接 ping 通背后的 Pod,只能访问服务端口
例子:
10.96.0.0/12
Service IP 如:10.96.0.1(kubernetes)、10.96.0.10(coredns)
//在宿主机上尝试 ping CoreDNS 的 10.96.0.10 (结论:网络不通)
# traceroute 10.96.0.10
、、、(失败 ==> 说明: k8s 的 10 私网网段IP确实未暴露到外部宿主机所在的局域网)
3. 最直观的区别对比表
项目
Pod 网段
Service 网段
给谁用
每个 Pod
每个 Service
是否真实 IP
真实,有网卡
虚拟,无网卡
是否变化
每次重建都会变
一旦创建固定不变
作用
Pod 之间直接通信
提供稳定访问入口、负载均衡
能否直接 ping
能
能 ping,但不代表后端 Pod 可达
网络模式
真实路由
NAT/iptables/ipvs 转发
4. 流量怎么走?(超简单比喻)
Pod IP = 某个人的手机号(换手机就变)
Service IP = 公司总机号(永远不变)
你打总机(Service IP)→ 总机转给某个人(Pod IP)
5. 为什么你会冲突?
因为:
你的运营商内网就是 10.96.x.x
K8s 默认 Service 网段也是 10.96.0.0/12
路由冲突,导致访问异常、 traceroute 跑到运营商设备上去
解决方法就是:
把 Service 网段改成不和 10.96 重叠的网段,比如:
10.255.0.0/16
172.31.0.0/16
192.168.100.0/20
Q: 如何修改k8s的默认网段 10.96.xx.xx 为其他?//todo
问题描述
//看 CLUSTER-IP 的网段
# kubectl get svc --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 25h
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 25h
要改的 10.96.0.0/12 是 K8s Service 网段(service-cluster-ip-range /serviceSubnet),不是 Pod 网段。
解决方法
Q: 基于 kubectl 列出所有命名空间下的所有容器镜像?
https://kubernetes.io/zh-cn/docs/tasks/access-application-cluster/list-all-running-container-images/
# kubectl get pods --all-namespaces -o jsonpath="{.items[*].spec.containers[*].image}" |\
tr -s '[[:space:]]' '\n' |\
sort |\
uniq -c
//out:
1 busybox
1 docker.io/calico/kube-controllers:v3.26.1
1 docker.io/calico/node:v3.26.1
2 registry.aliyuncs.com/google_containers/coredns:v1.10.1
1 registry.aliyuncs.com/google_containers/etcd:3.5.9-0
1 registry.aliyuncs.com/google_containers/kube-apiserver:v1.28.0
1 registry.aliyuncs.com/google_containers/kube-controller-manager:v1.28.0
1 registry.aliyuncs.com/google_containers/kube-proxy:v1.28.0
1 registry.aliyuncs.com/google_containers/kube-scheduler:v1.28.0
docker
# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
Y 推荐文献
K8s概述 - 博客园/千千寰宇
[Docker] Docker 基础教程(概念/原理/基础操作) - 博客园/千千寰宇
[Docker] 基于CENTOS7安装Docker环境 - 博客园/千千寰宇
[Docker] Docker Compose 基础教程(概念/基础操作) - 博客园/千千寰宇
X 参考文献
Linux安装Kubernetes(k8s)详细教程 - CSDN
kubeadm安装kubernetes 1.16.2 - 博客园 【推荐】
kubeadm-config说明 - CSDN
K8s集群CNI升级:Calico3.28.2安装全攻略 - 实践 - 博客园
Flannel 和 Calico 都是主流的 Kubernetes CNI,但二者在 “支持网络策略” 这一核心需求上存在关键差异
kubernetes集群(k8s)之安装部署Calico 网络 - CSDN 【推荐x4】
