一次安装istio出现no route to host的排错经历

半日闲 2020年11月11日 18次浏览

问题记录

​ 使用istioctl安装istio时,istiodpodrunning的,但是ingressgatewaypod一直无法running,随后查看pod的日志提示is Pilot runnning,如下

image-20201111212131617

使用curl访问svc时候,一直提示no route to host

系统环境

CentOS Linux (3.10.0-1062.el7.x86_64) 7 (Core)

kubernetes版本

Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:56:40Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:48:36Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

istioctl版本

client version: 1.5.2
control plane version: 1.5.2
data plane version: 1.5.2 (2 proxies)

故障描述

pod重新杀死后,svc是正常的,且对应的endpointpodip都已经更新了,但是ipvs的路由规则中的ip并没有更新

如下

image-20201110214421336

ipvs的规则由kube-proxy维护,查看kube-proxy的日志,如下

image-20201111204153180

原因

  1. pod重启后,分配了新的ip,但是ipvs中的路由规则中还是旧的ip,导致一直访问不通

  2. 查看kube-proxy日志,发现大量报错,如下

    Nov 11 07:38:53 a.b kube-proxy[15412]: E1111 07:38:53.427608   15412 proxier.go:1950] Failed to list IPVS destinations, error: parseIP Error ip=[172 20 0 34 0 0 0 0 0 0 0 0 0 0 0 0]
    Nov 11 07:38:53 a.b kube-proxy[15412]: E1111 07:38:53.427670   15412 proxier.go:1192] Failed to sync endpoint for service: 10.68.8.43:15012/TCP, err: parseIP Error ip=[172 20 0 34 0 0 0 0 0 0 0 0 0 0 0 0]
    Nov 11 07:38:53 a.b kube-proxy[15412]: E1111 07:38:53.429901   15412 proxier.go:1950] Failed to list IPVS destinations, error: parseIP Error ip=[192 168 1 8 0 0 0 0 0 0 0 0 0 0 0 0]
    Nov 11 07:38:53 a.b kube-proxy[15412]: E1111 07:38:53.429975   15412 proxier.go:1192] Failed to sync endpoint for service: 10.68.0.1:443/TCP, err: parseIP Error ip=[192 168 1 8 0 0 0 0 0 0 0 0 0 0 0 0]
    

解决方法

​ 升级系统内核至4.X以后

  1. 导入密钥,并安装源

    rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org && yum install -y https://www.elrepo.org/elrepo-release-7.el7.elrepo.noarch.rpm
    
  2. 列出可用的内核版本

    yum list available --disablerepo=* --enablerepo=elrepo-kernel
    

    image-20201111210857355

  3. 安装4.X的内核

    yum install -y kernel-lt-4.4.242-1.el7.elrepo --enablerepo=elrepo-kernel
    
  4. 确认内核是否安装成功

    cat /boot/grub2/grub.cfg | grep menuentry
    

    image-20201111211017701

  5. 设置默认使用的内核

    grub2-set-default "CentOS Linux (4.4.242-1.el7.elrepo.x86_64) 7 (Core)"
    
  6. 确认内核是否设置成功

    grub2-editenv list
    

    image-20201111211156134

  7. 重启

    reboot
    

备注:

​ 可以借鉴的排错

https://blog.csdn.net/cw03192/article/details/107105371/

https://cloud.tencent.com/developer/article/1554172

https://blog.csdn.net/cw03192/article/details/107105371/

​ 官方排错来源

https://github.com/kubernetes/kubernetes/issues/89520