生产环境Kubernetes Consul最佳实践

本指南为Consul agent在K8s中的运行方式,Server端建议运行在物理机上。
Consul的安装方式请参考本人的另一篇博文Consul集群安装,这里不做过多描述。

本方案已在生产环境中经过验证,暂时没有发现使用问题。

万博PT:Kubernetes中运行Consul agent的问题及应对方法

问题

  1. 业务如何去连接Consul agent。(Consul有一个特性为从哪台客户端注册的服务就要从哪台客户端反注销)。
  2. Consul agent启动的时候会根据主机名、IP等信息在data目录下生成自己的node-id等元数据。如果未持久化data目录,未使用主机网络,当Pod更新的时候,主机名和IP地址会改变。导致在Consul中出现同一个IP地址对应两个主机名的情况。服务注册就会出现问题。
    1. 第2中情况在生产环境中已遇到多次,同事更改了主机名称就导致在Consul集群中同一个IP对应两个主机名的情况。导致服务运行异常。

解决方法

  1. Consul-agent以DaemonSet的方式运行,通过使用主机网络(hostNetwork)。保持主机名和IP地址不变。将Consul的元数据持久化到宿主机的目录,这样Consul更新的时候,重新读取这个目录。不会重新生成node-id等元数据。
  2. 通过Deployment的环境变量注入的方式注入Consul agent的IP地址(即为物理机IP地址)。程序需要连接Consul的时候直接查找本机环境变量获取值即可连接。

配置

ConfigMap配置

~]# cat consul-client-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: consul-client
  namespace: consul
data:
  consul.json: |
    {
        "datacenter": "dc1",
        "client_addr": "0.0.0.0",
        "bind_addr": "{{ GetInterfaceIP \"eth0\" }}",
        "data_dir": "/consul/data",
        "retry_interval": "20s",
        "retry_join": ["10.111.67.1","10.111.67.2","10.111.67.3","10.111.67.4","10.111.67.5"],
        "enable_local_script_checks": true,
        "log_file": "/var/log/",
        "log_level": "trace",
        "pid_file": "/var/run/consul.pid",
        "performance": {
            "raft_multiplier": 1
        },
        "telemetry": {
            "prometheus_retention_time": "300s",
            "disable_hostname": true
        }
    }
  create-consul-registration.sh: |
    #!/bin/sh
    ADDR=`ip addr show|awk -F '[ /]+' '/eth[0-9]|em[0-9]/ && /inet/ {print $3}'`
    CONSUL_CONF_DIR='/consul/config'
    CONSUL_REDISTER_FILE="$CONSUL_CONF_DIR/consul-members-registration.json"

    if [[ -n "$ADDR" && -d $CONSUL_CONF_DIR ]];then
    cat > ${CONSUL_REDISTER_FILE} <<-EOF
    {
        "service": {
            "id": "consul-${ADDR}",
            "name": "consul-members",
            "tags": [
                "prometheus",
                "client",
                "consul-client"
            ],
            "address": "${ADDR}",
            "port": 8500,
            "check": {
                "http": "http://www.ib911.com/404:8500",
                "interval": "60s"
            }
        }
    }
    EOF
    else
            echo "ip address is empty or the $CONSUL_CONF_DIR does not exist"
    fi
  1. consul.json为Consul配置文件
  2. create-consul-registration.sh为生成服务自动注册脚本,主要用来监控Consul
    Consul监控请参考Consul Prometheus监控

DaemonSet配置

~]# cat consul-client-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: consul-client
  namespace: consul
  labels:
    app: consul
    environment: prod
    component: client
spec:
  minReadySeconds: 60
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: consul
      environment: prod
      commponent: client
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      namespace: consul
      labels:
        app: consul
        environment: prod
        commponent: client
    spec:
      containers:
      - env:
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: HOST_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        - name: POD_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        name: consul-client
        image: consul:1.5.1
        imagePullPolicy: IfNotPresent
        command:
        - "consul"
        - "agent"
        - "-config-dir=/consul/config"
        lifecycle:
          postStart:
            exec:
              command:
              - /bin/sh
              - -c
              - |
                /consul/create-consul-registration.sh
                consul reload
          preStop:
            exec:
              command:
              - /bin/sh
              - -c
              - consul leave
        ports:
        - name: http-api
          hostPort: 8500
          containerPort: 8500
          protocol: TCP
        - name: dns-tcp
          hostPort: 8600
          containerPort: 8600
          protocol: TCP
        - name: dns-udp
          hostPort: 8600
          containerPort: 8600
          protocol: UDP
        - name: server-rpc
          hostPort: 8300
          containerPort: 8300
          protocol: TCP
        - name: serf-lan-tcp
          hostPort: 8301
          containerPort: 8301
          protocol: TCP
        - name: serf-lan-udp
          hostPort: 8301
          containerPort: 8301
          protocol: UDP
        - name: serf-wan-tcp
          hostPort: 8302
          containerPort: 8302
          protocol: TCP
        - name: serf-wan-udp
          hostPort: 8302
          containerPort: 8302
          protocol: UDP
        volumeMounts:
        - name: consul-config
          mountPath: /consul/config/consul.json
          subPath: consul.json
        - name: consul-members
          mountPath: /consul/create-consul-registration.sh
          subPath: create-consul-registration.sh
        - name: consul-data-dir
          mountPath: /consul/data
        - name: localtime
          mountPath: /etc/localtime
        livenessProbe:
          tcpSocket:
            port: 8500
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          failureThreshold: 3
          timeoutSeconds: 1
        readinessProbe:
          httpGet:
           path: /v1/status/leader
           port: 8500
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          failureThreshold: 3
          timeoutSeconds: 1
        resources:
          requests:
            memory: "1024Mi"
            cpu: "1000m"
          limits:
            memory: "1024Mi"
            cpu: "1000m"
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      hostNetwork: true
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - name: consul-config
        configMap:
          name: consul-client
          items:
          - key: consul.json
            path: consul.json
      - name: consul-members
        configMap:
          name: consul-client
          defaultMode: 0755
          items:
          - key: create-consul-registration.sh
            path: create-consul-registration.sh
      - name: consul-data-dir
        hostPath:
          path: /data/consul/data
          type: DirectoryOrCreate
      - name: localtime
        hostPath:
          path: /etc/localtime
          type: File
  1. command指令为覆盖Consul默认启动参数
  2. lifecycle.postStart为启动后执行服务自注册脚本
  3. lifecycle.preStop为Consul停止之前要从Consul集群移除
  4. hostNetwork为使用宿主机网络名称空间
  5. volumes.name.localtime为使用物理机时区(默认镜像应该使用的是0时区)

Deployment配置

~]# cat deployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: business
    environment: prod
    release: release
  name: business
  namespace: prod-platform
spec:
  progressDeadlineSeconds: 600
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: business
      environment: prod
      release: release
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 0
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: business
        environment: prod
        release: release
    spec:
      shareProcessNamespace: true
      containers:
      - env:
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: HOST_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        - name: POD_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        - name: CONSUL_HTTP_ADDR
          value: "$(HOST_IP):8500"
        image: registry-vpc.cn-hangzhou.aliyuncs.com/prod/prod-business:v1
        imagePullPolicy: Always
        name: usercancel
        ports:
        - containerPort: 8999
        - containerPort: 9988
        livenessProbe:
          tcpSocket:
            port: 8999
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          failureThreshold: 3
          timeoutSeconds: 1
        readinessProbe:
          httpGet:
            path: /health
            port: 8999
          initialDelaySeconds: 15
          periodSeconds: 10
          successThreshold: 1
          failureThreshold: 3
          timeoutSeconds: 1
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1024Mi"
            cpu: "1000m"
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - name: data-vol
          mountPath: /logs
          subPath: logs
        - name: data-vol
          mountPath: /coredump
          subPath: coredump
      - env:
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: HOST_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        - name: POD_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        image: registry-vpc.cn-hangzhou.aliyuncs.com/devops/filebeat:7.4.2-1
        imagePullPolicy: IfNotPresent
        name: filebeat
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - name: filebeat-config
          mountPath: /usr/share/filebeat/filebeat.yml
          subPath: filebeat.yml
        - name: data-vol
          mountPath: /logs
          subPath: logs
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - name: data-vol
        persistentVolumeClaim:
          claimName: pvc-nas-prod-platform-business
      - name: filebeat-config
        configMap:
          name: business
          items:
          - key: filebeat.yml
            path: filebeat.yml
  1. containers.env.name.CONSUL_HTTP_ADDR对应的值为Consul的地址,万博PT:服务只需要获取环境变量CONSUL_HTTP_ADDR就可获取。