PRP Kubernetes Cluster

December 3, 2017 | Docker GPU HPC Machine Learning Kubernetes

PRP (Pacific Research Platform) has recently deployed a distributed Kubernetes cluster, as part of the CHASE-CI project funded by the NSF. In this post, we’ll take a look at its architecture.

Nodes

Show the cluster info, using the admin Kubeconfig:

$ kubectl cluster-info
Kubernetes master is running at https://67.58.53.146:6443
Heapster is running at https://67.58.53.146:6443/api/v1/namespaces/kube-system/services/heapster/proxy
KubeDNS is running at https://67.58.53.146:6443/api/v1/namespaces/kube-system/services/kube-dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

List all the nodes in the Kubernetes cluster, using the admin config:

$ kubectl get nodes
NAME                                  STATUS    ROLES     AGE       VERSION
coreos-01.calit2.optiputer.net        Ready     <none>    97d       v1.8.0
coreos-02.calit2.optiputer.net        Ready     <none>    97d       v1.8.0
coreos-03.calit2.optiputer.net        Ready     <none>    97d       v1.8.0
coreos-04.calit2.optiputer.net        Ready     <none>    97d       v1.8.0
coreos-05.calit2.optiputer.net        Ready     <none>    97d       v1.8.0
coreos-06.calit2.optiputer.net        Ready     <none>    97d       v1.8.0
coreos-07.calit2.optiputer.net        Ready     <none>    97d       v1.8.0
coreos-08.calit2.optiputer.net        Ready     <none>    97d       v1.8.0
fiona8.calit2.uci.edu                 Ready     <none>    36d       v1.8.0
k8s-gpu-01.calit2.optiputer.net       Ready     <none>    97d       v1.8.1
k8s-gpu-02.calit2.optiputer.net       Ready     <none>    97d       v1.8.0
k8s-nvme-01.sdsc.optiputer.net        Ready     <none>    55d       v1.8.1
ps-100g.sdsu.edu                      Ready     <none>    97d       v1.8.0
sdx-controller.calit2.optiputer.net   Ready     master    97d       v1.8.4
siderea.ucsc.edu                      Ready     <none>    36d       v1.8.3

The master node is sdx-controller.calit2.optiputer.net, whose public IP address is 67.58.53.146. Note this public IP address is the node’s InternalIP for the Kubernetes cluster:

$ kubectl describe nodes sdx-controller.calit2.optiputer.net
Roles:              master
Taints:             node-role.kubernetes.io/master:NoSchedule
Addresses:
  InternalIP:  67.58.53.146
  Hostname:    sdx-controller.calit2.optiputer.net

Contrary what some hostnames, e.g., coreos-01.calit2.optiputer.net, may appear to imply, all nodes run CentOS 7:

[root@coreos-01 ~]# cat /etc/centos-release
CentOS Linux release 7.4.1708 (Core)

but they use the “mainline stable” kernel-ml provided by ELRepo:

[root@coreos-01 ~]# uname -a
Linux coreos-01.calit2.optiputer.net 4.13.11-1.el7.elrepo.x86_64 #1 SMP Thu Nov 2 11:29:36 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

[root@coreos-01 ~]# rpm -qi kernel-ml
Name        : kernel-ml
Version     : 4.13.11
Release     : 1.el7.elrepo
Architecture: x86_64
Install Date: Fri 03 Nov 2017 12:53:40 PM PDT
Group       : System Environment/Kernel
Size        : 198868228
License     : GPLv2
Signature   : DSA/SHA1, Thu 02 Nov 2017 09:58:15 AM PDT, Key ID 309bc305baadae52
Source RPM  : kernel-ml-4.13.11-1.el7.elrepo.src.rpm
Build Date  : Thu 02 Nov 2017 09:48:38 AM PDT
Build Host  : Build64R7
Relocations : (not relocatable)
Packager    : Alan Bartlett <ajb@elrepo.org>
Vendor      : The ELRepo Project (http://elrepo.org)
URL         : https://www.kernel.org/
Summary     : The Linux kernel. (The core of any Linux-based operating system.)
Description :
This package provides the Linux kernel (vmlinuz), the core of any
Linux-based operating system. The kernel handles the basic functions
of the OS: memory allocation, process allocation, device I/O, etc.

As of this writing (December 3, 2017), there are 4 GPU nodes in the cluster:

each of which has eight Nvidia GeForce GTX 1080 GPUs.

Control Plane

The Control Plane runs on the master node, and is what controls and makes the whole Kubernetes cluster function. The components that make up the Control Plane are:

  • the etcd distributed persistent storage
  • the API server
  • the Scheduler
  • the Controller Manager

List the Control Plane components and their status:

$ kubectl get componentstatuses
NAME                 STATUS    MESSAGE              ERROR
scheduler            Healthy   ok
controller-manager   Healthy   ok
etcd-0               Healthy   {"health": "true"}

The Control Plane components, as well as kube-proxy, can either be deployed on the system directly or they can run as pods. List all pods in the kube-system namespace:

$ kubectl get pods -o custom-columns=POD:metadata.name,NODE:spec.nodeName \
          --sort-by spec.nodeName -n kube-system
POD                                                           NODE
traefik-ingress-controller-cn2j4                              coreos-01.calit2.optiputer.net
heapster-79c9d5f97-7qfk4                                      coreos-01.calit2.optiputer.net
kube-proxy-trvrb                                              coreos-01.calit2.optiputer.net
kube-router-7rqvl                                             coreos-01.calit2.optiputer.net
oidc-auth-1834198362-rwcsk                                    coreos-02.calit2.optiputer.net
oidc-auth-dev-7cbd7f56c5-fwjhn                                coreos-02.calit2.optiputer.net
kube-proxy-rqjff                                              coreos-02.calit2.optiputer.net
kube-router-hdjw5                                             coreos-02.calit2.optiputer.net
traefik-ingress-controller-cs6qj                              coreos-02.calit2.optiputer.net
traefik-ingress-controller-pfcbx                              coreos-03.calit2.optiputer.net
kube-proxy-gt5w9                                              coreos-03.calit2.optiputer.net
kube-router-zdwvw                                             coreos-03.calit2.optiputer.net
tiller-deploy-5588f6c684-d6j6g                                coreos-04.calit2.optiputer.net
kube-proxy-448rp                                              coreos-04.calit2.optiputer.net
kube-dns-d6d4674ff-9z8t5                                      coreos-04.calit2.optiputer.net
traefik-ingress-controller-nhc69                              coreos-04.calit2.optiputer.net
kube-router-q42cc                                             coreos-04.calit2.optiputer.net
traefik-ingress-controller-r4q88                              coreos-05.calit2.optiputer.net
kube-dns-d6d4674ff-lb9ff                                      coreos-05.calit2.optiputer.net
kube-proxy-qdkzg                                              coreos-05.calit2.optiputer.net
kube-router-57zbh                                             coreos-05.calit2.optiputer.net
kubernetes-dashboard-759ccc86d9-bf7sk                         coreos-06.calit2.optiputer.net
kube-router-gnrhk                                             coreos-06.calit2.optiputer.net
traefik-ingress-controller-vrqzt                              coreos-06.calit2.optiputer.net
kube-proxy-z4q07                                              coreos-06.calit2.optiputer.net
kube-router-26rwg                                             coreos-07.calit2.optiputer.net
traefik-ingress-controller-5mbdm                              coreos-07.calit2.optiputer.net
kube-proxy-c0h05                                              coreos-07.calit2.optiputer.net
traefik-ingress-controller-fm7ms                              coreos-08.calit2.optiputer.net
kube-router-5sqs2                                             coreos-08.calit2.optiputer.net
kube-proxy-hsv56                                              coreos-08.calit2.optiputer.net
kube-router-jf974                                             fiona8.calit2.uci.edu
traefik-ingress-controller-hgwkf                              fiona8.calit2.uci.edu
kube-proxy-55lkt                                              fiona8.calit2.uci.edu
kube-proxy-cm9jz                                              k8s-gpu-01.calit2.optiputer.net
kube-router-ftlfb                                             k8s-gpu-01.calit2.optiputer.net
traefik-ingress-controller-bbpjh                              k8s-gpu-01.calit2.optiputer.net
traefik-ingress-controller-xw89x                              k8s-gpu-02.calit2.optiputer.net
kube-router-m7gh7                                             k8s-gpu-02.calit2.optiputer.net
kube-proxy-lw5jv                                              k8s-gpu-02.calit2.optiputer.net
traefik-ingress-controller-t69np                              k8s-nvme-01.sdsc.optiputer.net
kube-router-mkg7g                                             k8s-nvme-01.sdsc.optiputer.net
kube-proxy-pwnpb                                              k8s-nvme-01.sdsc.optiputer.net
traefik-ingress-controller-xx9j2                              ps-100g.sdsu.edu
kube-proxy-ps8pl                                              ps-100g.sdsu.edu
kube-router-kptgn                                             ps-100g.sdsu.edu
kube-scheduler-sdx-controller.calit2.optiputer.net            sdx-controller.calit2.optiputer.net
kube-router-8m25r                                             sdx-controller.calit2.optiputer.net
kube-proxy-7h6qs                                              sdx-controller.calit2.optiputer.net
kube-apiserver-sdx-controller.calit2.optiputer.net            sdx-controller.calit2.optiputer.net
kube-controller-manager-sdx-controller.calit2.optiputer.net   sdx-controller.calit2.optiputer.net
etcd-sdx-controller.calit2.optiputer.net                      sdx-controller.calit2.optiputer.net
kube-proxy-6x4rp                                              siderea.ucsc.edu
kube-router-7lm9v                                             siderea.ucsc.edu
traefik-ingress-controller-mmkcs                              siderea.ucsc.edu

We can see that Control Plane components are running as pods on the master node.

Components running on the Worker Nodes

The following components are running on each worker node:

  • Kubelet
  • Kubernetes Service Proxy (kube-proxy)
  • Container Runtime (Docker)

Kubelet and Docker are the only components that always run as regular system components. It is Kubelet that then runs all the other components as pods. To run the Control Plane components as pods, Kubelet is also deployed on the master node.

[root@sdx-controller ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Tue 2017-11-21 12:38:30 PST; 3 weeks 2 days ago
     Docs: http://kubernetes.io/docs/
 Main PID: 13796 (kubelet)
   Memory: 111.5M
   CGroup: /system.slice/kubelet.service
           └─13796 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bo...

Every node runs kube-proxy, whose purpose is to make sure clients can connect to the services through the Kubernetes API. kube-proxy is deployed as a DaemonSet:

$ kubectl describe daemonsets kube-proxy -n kube-system
Name:           kube-proxy
Selector:       k8s-app=kube-proxy
Node-Selector:  <none>
Labels:         k8s-app=kube-proxy
Annotations:    <none>
Desired Number of Nodes Scheduled: 15
Current Number of Nodes Scheduled: 15
Number of Nodes Scheduled with Up-to-date Pods: 15
Number of Nodes Scheduled with Available Pods: 15
Number of Nodes Misscheduled: 0
Pods Status:  15 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           k8s-app=kube-proxy
  Service Account:  kube-proxy
  Containers:
   kube-proxy:
    Image:  gcr.io/google_containers/kube-proxy-amd64:v1.8.0
    Port:   <none>
    Command:
      /usr/local/bin/kube-proxy
      --kubeconfig=/var/lib/kube-proxy/kubeconfig.conf
      --cluster-cidr=10.244.0.0/16
    Environment:  <none>
    Mounts:
      /run/xtables.lock from xtables-lock (rw)
      /var/lib/kube-proxy from kube-proxy (rw)
  Volumes:
   kube-proxy:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-proxy
    Optional:  false
   xtables-lock:
    Type:  HostPath (bare host directory volume)
    Path:  /run/xtables.lock
Events:    <none>

Add-on Components

The following add-on components are running on the Kubernetes cluster:

  • Kubernetes DNS server (kube-dns)
  • Kubernetes Dashboard (kubernetes-dashboard)
  • An Ingress controller (traefik-ingress-controller)
  • Heapster
  • Container Network Interface (CNI) network plugin (kube-router)

Some of the add-on components are deployed as DaemonSets:

$ kubectl get daemonsets -n kube-system
NAME                         DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
kube-proxy                   15        15        15        15           15          <none>          97d
kube-router                  15        15        15        15           15          <none>          78d
traefik-ingress-controller   14        14        14        14           14          <none>          62d

whereas others are deployed as Deployments:

$ kubectl get deployments -n kube-system
NAME                   DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
heapster               1         1         1            1           22d
kube-dns               2         2         2            2           97d
kubernetes-dashboard   1         1         1            1           97d
oidc-auth              1         1         1            1           69d
oidc-auth-dev          1         1         1            1           23d
tiller-deploy          1         1         1            1           69d

We can see that, in additional to the standard add-on components, the cluster runs the following components:

  • OpenID Connect (oidc-auth & oidc-auth-dev )
  • Kubernetes Helm (tiller-deploy), for managing Kubernetes charts.

Services

The following Services are defined, in the kube-system namespace:

$ kubectl get services -n kube-system
NAME                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                     AGE
heapster                  ClusterIP   10.100.95.3      <none>        80/TCP                                      23d
kube-dns                  ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP                               98d
kubelet                   ClusterIP   None             <none>        10250/TCP                                   73d
kubernetes-dashboard      ClusterIP   10.109.25.22     <none>        80/TCP                                      98d
oidc-auth                 ClusterIP   10.109.124.246   <none>        80/TCP                                      70d
oidc-auth-dev             NodePort    10.111.22.41     <none>        80:32411/TCP                                24d
tiller-deploy             ClusterIP   10.110.120.158   <none>        44134/TCP                                   70d
traefik-ingress-service   NodePort    10.109.194.220   <none>        80:31304/TCP,443:30227/TCP,8080:32420/TCP   63d

And in the default namespace, the kubernetes service is defined:

$ kubectl get services
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   98d

$ kubectl get services kubernetes -o yaml
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: 2017-09-08T23:46:51Z
  labels:
    component: apiserver
    provider: kubernetes
  name: kubernetes
  namespace: default
  resourceVersion: "20"
  selfLink: /api/v1/namespaces/default/services/kubernetes
  uid: fce19b8b-94ef-11e7-b6c9-0cc47a6a1e1e
spec:
  clusterIP: 10.96.0.1
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: 6443
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 10800
  type: ClusterIP
status:
  loadBalancer: {}