PRP (Pacific Research Platform) has recently deployed a distributed Kubernetes cluster, as part of the CHASE-CI project funded by the NSF. In this post, we’ll take a look at its architecture.
Nodes
Show the cluster info, using the admin Kubeconfig:
$ kubectl cluster-info
Kubernetes master is running at https://67.58.53.146:6443
Heapster is running at https://67.58.53.146:6443/api/v1/namespaces/kube-system/services/heapster/proxy
KubeDNS is running at https://67.58.53.146:6443/api/v1/namespaces/kube-system/services/kube-dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
List all the nodes in the Kubernetes cluster, using the admin config:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
coreos-01.calit2.optiputer.net Ready <none> 97d v1.8.0
coreos-02.calit2.optiputer.net Ready <none> 97d v1.8.0
coreos-03.calit2.optiputer.net Ready <none> 97d v1.8.0
coreos-04.calit2.optiputer.net Ready <none> 97d v1.8.0
coreos-05.calit2.optiputer.net Ready <none> 97d v1.8.0
coreos-06.calit2.optiputer.net Ready <none> 97d v1.8.0
coreos-07.calit2.optiputer.net Ready <none> 97d v1.8.0
coreos-08.calit2.optiputer.net Ready <none> 97d v1.8.0
fiona8.calit2.uci.edu Ready <none> 36d v1.8.0
k8s-gpu-01.calit2.optiputer.net Ready <none> 97d v1.8.1
k8s-gpu-02.calit2.optiputer.net Ready <none> 97d v1.8.0
k8s-nvme-01.sdsc.optiputer.net Ready <none> 55d v1.8.1
ps-100g.sdsu.edu Ready <none> 97d v1.8.0
sdx-controller.calit2.optiputer.net Ready master 97d v1.8.4
siderea.ucsc.edu Ready <none> 36d v1.8.3
The master node is sdx-controller.calit2.optiputer.net , whose public IP address is 67.58.53.146 . Note this public IP address is the node’s InternalIP for the Kubernetes cluster:
$ kubectl describe nodes sdx-controller.calit2.optiputer.net
Roles: master
Taints: node-role.kubernetes.io/master:NoSchedule
Addresses:
InternalIP: 67.58.53.146
Hostname: sdx-controller.calit2.optiputer.net
Contrary what some hostnames, e.g., coreos-01.calit2.optiputer.net , may appear to imply, all nodes run CentOS 7:
[root@coreos-01 ~]# cat /etc/centos-release
CentOS Linux release 7.4.1708 (Core)
but they use the “mainline stable” kernel-ml provided by ELRepo :
[root@coreos-01 ~]# uname -a
Linux coreos-01.calit2.optiputer.net 4.13.11-1.el7.elrepo.x86_64 # 1 SMP Thu Nov 2 11:29:36 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@coreos-01 ~]# rpm -qi kernel-ml
Name : kernel-ml
Version : 4.13.11
Release : 1.el7.elrepo
Architecture: x86_64
Install Date: Fri 03 Nov 2017 12:53:40 PM PDT
Group : System Environment/Kernel
Size : 198868228
License : GPLv2
Signature : DSA/SHA1, Thu 02 Nov 2017 09:58:15 AM PDT, Key ID 309bc305baadae52
Source RPM : kernel-ml-4.13.11-1.el7.elrepo.src.rpm
Build Date : Thu 02 Nov 2017 09:48:38 AM PDT
Build Host : Build64R7
Relocations : (not relocatable)
Packager : Alan Bartlett <ajb@elrepo.org>
Vendor : The ELRepo Project (http://elrepo.org)
URL : https://www.kernel.org/
Summary : The Linux kernel. (The core of any Linux-based operating system.)
Description :
This package provides the Linux kernel (vmlinuz), the core of any
Linux-based operating system. The kernel handles the basic functions
of the OS: memory allocation, process allocation, device I/O, etc.
As of this writing (December 3, 2017), there are 4 GPU nodes in the cluster:
each of which has eight Nvidia GeForce GTX 1080 GPUs.
Control Plane
The Control Plane runs on the master node, and is what controls and makes the whole Kubernetes cluster function. The components that make up the Control Plane are:
the etcd distributed persistent storage
the API server
the Scheduler
the Controller Manager
List the Control Plane components and their status:
$ kubectl get componentstatuses
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health": "true"}
The Control Plane components, as well as kube-proxy , can either be deployed on the system directly or they can run as pods. List all pods in the kube-system namespace:
$ kubectl get pods -o custom-columns=POD:metadata.name,NODE:spec.nodeName \
--sort-by spec.nodeName -n kube-system
POD NODE
traefik-ingress-controller-cn2j4 coreos-01.calit2.optiputer.net
heapster-79c9d5f97-7qfk4 coreos-01.calit2.optiputer.net
kube-proxy-trvrb coreos-01.calit2.optiputer.net
kube-router-7rqvl coreos-01.calit2.optiputer.net
oidc-auth-1834198362-rwcsk coreos-02.calit2.optiputer.net
oidc-auth-dev-7cbd7f56c5-fwjhn coreos-02.calit2.optiputer.net
kube-proxy-rqjff coreos-02.calit2.optiputer.net
kube-router-hdjw5 coreos-02.calit2.optiputer.net
traefik-ingress-controller-cs6qj coreos-02.calit2.optiputer.net
traefik-ingress-controller-pfcbx coreos-03.calit2.optiputer.net
kube-proxy-gt5w9 coreos-03.calit2.optiputer.net
kube-router-zdwvw coreos-03.calit2.optiputer.net
tiller-deploy-5588f6c684-d6j6g coreos-04.calit2.optiputer.net
kube-proxy-448rp coreos-04.calit2.optiputer.net
kube-dns-d6d4674ff-9z8t5 coreos-04.calit2.optiputer.net
traefik-ingress-controller-nhc69 coreos-04.calit2.optiputer.net
kube-router-q42cc coreos-04.calit2.optiputer.net
traefik-ingress-controller-r4q88 coreos-05.calit2.optiputer.net
kube-dns-d6d4674ff-lb9ff coreos-05.calit2.optiputer.net
kube-proxy-qdkzg coreos-05.calit2.optiputer.net
kube-router-57zbh coreos-05.calit2.optiputer.net
kubernetes-dashboard-759ccc86d9-bf7sk coreos-06.calit2.optiputer.net
kube-router-gnrhk coreos-06.calit2.optiputer.net
traefik-ingress-controller-vrqzt coreos-06.calit2.optiputer.net
kube-proxy-z4q07 coreos-06.calit2.optiputer.net
kube-router-26rwg coreos-07.calit2.optiputer.net
traefik-ingress-controller-5mbdm coreos-07.calit2.optiputer.net
kube-proxy-c0h05 coreos-07.calit2.optiputer.net
traefik-ingress-controller-fm7ms coreos-08.calit2.optiputer.net
kube-router-5sqs2 coreos-08.calit2.optiputer.net
kube-proxy-hsv56 coreos-08.calit2.optiputer.net
kube-router-jf974 fiona8.calit2.uci.edu
traefik-ingress-controller-hgwkf fiona8.calit2.uci.edu
kube-proxy-55lkt fiona8.calit2.uci.edu
kube-proxy-cm9jz k8s-gpu-01.calit2.optiputer.net
kube-router-ftlfb k8s-gpu-01.calit2.optiputer.net
traefik-ingress-controller-bbpjh k8s-gpu-01.calit2.optiputer.net
traefik-ingress-controller-xw89x k8s-gpu-02.calit2.optiputer.net
kube-router-m7gh7 k8s-gpu-02.calit2.optiputer.net
kube-proxy-lw5jv k8s-gpu-02.calit2.optiputer.net
traefik-ingress-controller-t69np k8s-nvme-01.sdsc.optiputer.net
kube-router-mkg7g k8s-nvme-01.sdsc.optiputer.net
kube-proxy-pwnpb k8s-nvme-01.sdsc.optiputer.net
traefik-ingress-controller-xx9j2 ps-100g.sdsu.edu
kube-proxy-ps8pl ps-100g.sdsu.edu
kube-router-kptgn ps-100g.sdsu.edu
kube-scheduler-sdx-controller.calit2.optiputer.net sdx-controller.calit2.optiputer.net
kube-router-8m25r sdx-controller.calit2.optiputer.net
kube-proxy-7h6qs sdx-controller.calit2.optiputer.net
kube-apiserver-sdx-controller.calit2.optiputer.net sdx-controller.calit2.optiputer.net
kube-controller-manager-sdx-controller.calit2.optiputer.net sdx-controller.calit2.optiputer.net
etcd-sdx-controller.calit2.optiputer.net sdx-controller.calit2.optiputer.net
kube-proxy-6x4rp siderea.ucsc.edu
kube-router-7lm9v siderea.ucsc.edu
traefik-ingress-controller-mmkcs siderea.ucsc.edu
We can see that Control Plane components are running as pods on the master node.
Components running on the Worker Nodes
The following components are running on each worker node:
Kubelet
Kubernetes Service Proxy (kube-proxy )
Container Runtime (Docker )
Kubelet and Docker are the only components that always run as regular system components. It is Kubelet that then runs all the other components as pods. To run the Control Plane components as pods, Kubelet is also deployed on the master node.
[root@sdx-controller ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Tue 2017-11-21 12:38:30 PST; 3 weeks 2 days ago
Docs: http://kubernetes.io/docs/
Main PID: 13796 (kubelet)
Memory: 111.5M
CGroup: /system.slice/kubelet.service
└─13796 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bo...
Every node runs kube-proxy , whose purpose is to make sure clients can connect to the services through the Kubernetes API. kube-proxy is deployed as a DaemonSet :
$ kubectl describe daemonsets kube-proxy -n kube-system
Name: kube-proxy
Selector: k8s-app=kube-proxy
Node-Selector: <none>
Labels: k8s-app=kube-proxy
Annotations: <none>
Desired Number of Nodes Scheduled: 15
Current Number of Nodes Scheduled: 15
Number of Nodes Scheduled with Up-to-date Pods: 15
Number of Nodes Scheduled with Available Pods: 15
Number of Nodes Misscheduled: 0
Pods Status: 15 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: k8s-app=kube-proxy
Service Account: kube-proxy
Containers:
kube-proxy:
Image: gcr.io/google_containers/kube-proxy-amd64:v1.8.0
Port: <none>
Command:
/usr/local/bin/kube-proxy
--kubeconfig=/var/lib/kube-proxy/kubeconfig.conf
--cluster-cidr=10.244.0.0/16
Environment: <none>
Mounts:
/run/xtables.lock from xtables-lock (rw)
/var/lib/kube-proxy from kube-proxy (rw)
Volumes:
kube-proxy:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kube-proxy
Optional: false
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
Events: <none>
Add-on Components
The following add-on components are running on the Kubernetes cluster:
Kubernetes DNS server (kube-dns )
Kubernetes Dashboard (kubernetes-dashboard )
An Ingress controller (traefik-ingress-controller )
Heapster
Container Network Interface (CNI) network plugin (kube-router )
Some of the add-on components are deployed as DaemonSets :
$ kubectl get daemonsets -n kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-proxy 15 15 15 15 15 <none> 97d
kube-router 15 15 15 15 15 <none> 78d
traefik-ingress-controller 14 14 14 14 14 <none> 62d
whereas others are deployed as Deployments :
$ kubectl get deployments -n kube-system
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
heapster 1 1 1 1 22d
kube-dns 2 2 2 2 97d
kubernetes-dashboard 1 1 1 1 97d
oidc-auth 1 1 1 1 69d
oidc-auth-dev 1 1 1 1 23d
tiller-deploy 1 1 1 1 69d
We can see that, in additional to the standard add-on components, the cluster runs the following components:
OpenID Connect (oidc-auth & oidc-auth-dev )
Kubernetes Helm (tiller-deploy ), for managing Kubernetes charts.
Services
The following Services are defined, in the kube-system namespace:
$ kubectl get services -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
heapster ClusterIP 10.100.95.3 <none> 80/TCP 23d
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 98d
kubelet ClusterIP None <none> 10250/TCP 73d
kubernetes-dashboard ClusterIP 10.109.25.22 <none> 80/TCP 98d
oidc-auth ClusterIP 10.109.124.246 <none> 80/TCP 70d
oidc-auth-dev NodePort 10.111.22.41 <none> 80:32411/TCP 24d
tiller-deploy ClusterIP 10.110.120.158 <none> 44134/TCP 70d
traefik-ingress-service NodePort 10.109.194.220 <none> 80:31304/TCP,443:30227/TCP,8080:32420/TCP 63d
And in the default namespace, the kubernetes service is defined:
$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 98d
$ kubectl get services kubernetes -o yaml
apiVersion: v1
kind: Service
metadata:
creationTimestamp: 2017-09-08T23:46:51Z
labels:
component: apiserver
provider: kubernetes
name: kubernetes
namespace: default
resourceVersion: "20"
selfLink: /api/v1/namespaces/default/services/kubernetes
uid: fce19b8b-94ef-11e7-b6c9-0cc47a6a1e1e
spec:
clusterIP: 10.96.0.1
ports:
- name: https
port: 443
protocol: TCP
targetPort: 6443
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 10800
type: ClusterIP
status:
loadBalancer: {}