Kubernetes Logo Works with Kubernetes 1.16 !

For my preparation to the Cloud Native Computing Foundation - Certified Kubernetes Administrator exam (or CNCF CKA for short), it is important to get the Ins and Outs of creating Kubernetes clusters by hand. This includes generating all the certificates, systemd unit files, K8s configs and the installation of components.

Most of you will already know about Kelsey Hightower’s fantastic “Kubernetes The Hard Way” tutorial on GitHub. There is even a LinuxAcademy course and forks of the tutorial for Bare Metal installations (GitHub / Medium). And of course there are already several guides for AWS (here and here).

So why write another one? Simple answer: there is not a single all-in-one guide for AWS and Multi-Master non-stacked Kubernetes setup which I looked for. Besides, it was a good use-case to do some first baby steps with the AWS CDK for Python :)

If you haven’t already checked out the CDK, give it a try! => https://aws.amazon.com/cdk

So what is this all about? Execute Summary: it creates the infrastructure and components for a multi-node non-stacked Kubernetes Cluster (v1.16!).

Features

  • CDK (Python, produces CloudFormation) code available
  • Terraform code available
  • Multi Master HA Kubernetes control plane
  • Non-Stacked setup (etcd servers are running on their own instances)
  • Default: 9x EC2 instances
    • 3x etcd nodes
    • 3x master nodes
    • 3x worker nodes
  • Route53 records for all EC2 instances’ internal & external IPv4 addresses
  • Load Balancer (external kubectl access, fronts Kubernetes API Servers)
  • External access to all EC2 nodes by your workstation IPv4 address only

Infrastructure as Code

Terraform: GitHub Logo hajowieland/terraform-k8s-the-right-hard-way-aws

AWS CDK (Python, CloudFormation): GitHub Logo hajowieland/cdk-py-k8s-the-right-hard-way-aws

Create Infrastructure

First we need to create our infrastructure and you can use either the AWS CDK or Terraform repository above. Both will create the same infrastructure with nine EC2 instances for a non-stacked Kubernetes setup. If you change the number of nodes, you have to adapt the below instructions accordingly.

You have to configure the Route53 Hosted Zone to use so the IaC will create DNS A records for all node’s public IPv4 addresses:

  • etcd{1-3}.example.com (Public IPv4)
  • etcd{1-3}.internal.example.com (Internal IPv4)
  • master{1-3}.example.com (Public IPv4)
  • master{1-3}.internal.example.com (Internal IPv4)
  • worker{1-3}.example.com (Public IPv4)
  • worker{1-3}.internal.example.com (Internal IPv4)

It also provisions the instances with a User Data script to install cfssl for creating the K8s certificates (but we will create them later locally on our workstation and transfer them to the nodes):

It creates some SecurityGroups which are very basic - you should not use production data on these instances 😉 It’s for learning purposes!

  • ANY (TCP) from your workstation external IPv4
  • ANY (TCP) within VPC
  • ANY Egress

And finally a Load Balancer which is used as our remote endpoint for kubectl access:

  • ELB (fronts kube-apiservers)

Create Kubernetes Cluster the (right) hard way

Notes

❕You need to replace napo.io in all the following commands with your own domain!

Example:

internal.napo.io ⏩ internal.mydomain.net

napo.io ⏩ mydomain.net

For parallel execution on multiple instances at once, use tmux and this multiplexer script: https://gist.github.com/dmytro/3984680

And to make your life easier, I would highly recommend creating ~/.ssh/config entries for all internal and external EC2 instances’ DNS records like this:

ℹ️ Replace the Domain and IdentityFile (your AWS EC2 Key Pair) accordingly

Host etcd1
  HostName etcd1.napo.io
  User ubuntu
  IdentityFile ~/.ssh/id_rsa

Host etcd1-internal
  HostName etcd1.internal.napo.io
  User ubuntu
  IdentityFile ~/.ssh/id_rsa

Host etcd2
  HostName etcd2.napo.io
  User ubuntu
  IdentityFile ~/.ssh/id_rsa

Host etcd2-internal
  HostName etcd2.internal.napo.io
  User ubuntu
  IdentityFile ~/.ssh/id_rsa

Host etcd3
  HostName etcd3.napo.io
  User ubuntu
  IdentityFile ~/.ssh/id_rsa

Host etcd3-internal
  HostName etcd3.internal.napo.io
  User ubuntu
  IdentityFile ~/.ssh/id_rsa

Host master1
  HostName master1.napo.io
  User ubuntu
  IdentityFile ~/.ssh/id_rsa

Host master1-internal
  HostName master1.internal.napo.io
  User ubuntu
  IdentityFile ~/.ssh/id_rsa

Host master2
  HostName master2.napo.io
  User ubuntu
  IdentityFile ~/.ssh/id_rsa

Host master2-internal
  HostName master2.internal.napo.io
  User ubuntu
  IdentityFile ~/.sshp/id_rsa

Host master3
  HostName master3.napo.io
  User ubuntu
  IdentityFile ~/.ssh/id_rsa

Host master3-internal
  HostName master3.internal.napo.io
  User ubuntu
  IdentityFile ~/.ssh/id_rsa

Host worker1
  HostName worker1.napo.io
  User ubuntu
  IdentityFile ~/.ssh/id_rsa


Host worker1-internal
  HostName worker1.internal.napo.io
  User ubuntu
  IdentityFile ~/.ssh/id_rsa

Host worker2
  HostName worker2.napo.io
  User ubuntu
  IdentityFile ~/.ssh/id_rsa

Host worker2-internal
  HostName worker2.internal.napo.io
  User ubuntu
  IdentityFile ~/.ssh/id_rsa

Host worker3
  HostName worker3.napo.io
  User ubuntu
  IdentityFile ~/.ssh/id_rsa

Host worker3-internal
  HostName worker3.internal.napo.io
  User ubuntu
  IdentityFile ~/.ssh/id_rsa

Create certificates

On your local workstation, create the certificates needed for Kubernetes (CA, Signing Requests, etc.)

Certificate Authority

For our Certificate Authority (Lifetime 8760h => 1 year) we create 4096-bit RSA keys:

echo '{
  "signing": {
    "default": {
      "expiry": "8760h"
    },
    "profiles": {
      "kubernetes": {
        "usages": ["signing", "key encipherment", "server auth", "client auth"],
        "expiry": "8760h"
      }
    }
  }
}' > ca-config.json
echo '{
  "CN": "Kubernetes",
  "key": {
    "algo": "rsa",
    "size": 4096
  },
  "names": [
    {
      "C": "DE",
      "L": "Munich",
      "O": "Kubernetes",
      "OU": "Kubernetes The Right Hard Way",
      "ST": "napo.io"
    }
  ]
}' > ca-csr.json
cfssl gencert -initca ca-csr.json | cfssljson -bare ca

Client and Server Certificates

Now create client and server certificates with their corresponding Certificate Signing Requests (CSRs)

Admin Client Certificate

cat > admin-csr.json <<EOF
{
  "CN": "admin",
  "key": {
    "algo": "rsa",
    "size": 4096
  },
  "names": [
    {
      "C": "DE",
      "L": "Munich",
      "O": "system:masters",
      "OU": "Kubernetes The Right Hard Way",
      "ST": "napo.io"
    }
  ]
}
EOF
cfssl gencert \
  -ca=ca.pem \
  -ca-key=ca-key.pem \
  -config=ca-config.json \
  -profile=kubernetes \
  admin-csr.json | cfssljson -bare admin

Kubelet Client Certificates

For the kubelet client certs we need to get the internal and external Worker Node IPv4 addresses first.

Get internal IPv4 addresses:

for i in 1 2 3; do export WORKER${i}_INTERNAL=$(dig +short worker${i}.internal.napo.io); done

Get external IPv4 addresses:

for i in 1 2 3; do export WORKER${i}_EXTERNAL=$(dig +short worker${i}.napo.io); done
for i in 1 2 3; do
  cat > worker${i}-csr.json <<EOF
  {
    "CN": "system:node:worker${i}.internal.napo.io",
    "key": {
      "algo": "rsa",
      "size": 4096
    },
    "names": [
      {
        "C": "DE",
        "L": "Munich",
        "O": "system:nodes",
        "OU": "Kubernetes The Right Hard Way",
        "ST": "napo.io"
      }
    ]
  }
EOF
; done

Worker 1:

cfssl gencert \
  -ca=ca.pem \
  -ca-key=ca-key.pem \
  -config=ca-config.json \
  -hostname=worker1.internal.napo.io,${WORKER1_INTERNAL},${WORKER1_EXTERNAL} \
  -profile=kubernetes \
  worker1-csr.json | cfssljson -bare worker1

Worker2:

cfssl gencert \
  -ca=ca.pem \
  -ca-key=ca-key.pem \
  -config=ca-config.json \
  -hostname=worker2.internal.napo.io,${WORKER2_INTERNAL},${WORKER2_EXTERNAL} \
  -profile=kubernetes \
  worker2-csr.json | cfssljson -bare worker2

Worker3:

cfssl gencert \
  -ca=ca.pem \
  -ca-key=ca-key.pem \
  -config=ca-config.json \
  -hostname=worker3.internal.napo.io,${WORKER3_INTERNAL},${WORKER3_EXTERNAL} \
  -profile=kubernetes \
  worker3-csr.json | cfssljson -bare worker3

kube-proxy Client Certificate

Now we create everything needed for the kube-proxy component.

cat > kube-proxy-csr.json <<EOF
{
  "CN": "system:kube-proxy",
  "key": {
    "algo": "rsa",
    "size": 4096
  },
  "names": [
    {
      "C": "DE",
      "L": "Munich",
      "O": "system:node-proxier",
      "OU": "Kubernetes The Right Hard Way",
      "ST": "napo.io"
    }
  ]
}
EOF
cfssl gencert \
  -ca=ca.pem \
  -ca-key=ca-key.pem \
  -config=ca-config.json \
  -profile=kubernetes \
  kube-proxy-csr.json | cfssljson -bare kube-proxy

Kubernetes API Server Certificate

And finally the certificates for kube-apiserver.

cat > kubernetes-csr.json <<EOF
{
  "CN": "kubernetes",
  "key": {
    "algo": "rsa",
    "size": 4096
  },
  "names": [
    {
      "C": "DE",
      "L": "Munich",
      "O": "Kubernetes",
      "OU": "Kubernetes The Right Hard Way",
      "ST": "napo.io"
    }
  ]
}
EOF

ℹ️ NOTE: 10.32.0.1 ==> kubernetes.default.svc.cluster.local.

https://github.com/kelseyhightower/kubernetes-the-hard-way/issues/105

First get the ELB (Load Balancer) DNS name:

ELB_DNS=$(aws elb describe-load-balancers --query 'LoadBalancerDescriptions[].DNSName' --output text)
for i in 1 2 3; do export ETCD${i}_INTERNAL=$(dig +short etcd${i}.internal.test.ventx.de); done
for i in 1 2 3; do export MASTER${i}_INTERNAL=$(dig +short master${i}.internal.test.ventx.de); done
for i in 1 2 3; do export WORKER${i}_INTERNAL=$(dig +short worker${i}.internal.test.ventx.de); done
cfssl gencert \
  -ca=ca.pem \
  -ca-key=ca-key.pem \
  -config=ca-config.json \
  -hostname=10.32.0.1,${ETCD1_INTERNAL},${ETCD2_INTERNAL},${ETCD3_INTERNAL},${WORKER1_INTERNAL},${WORKER2_INTERNAL},${WORKER3_INTERNAL},${MASTER1_INTERNAL},${MASTER2_INTERNAL},${MASTER3_INTERNAL},master1.internal.napo.io,master2.internal.napo.io,master3.internal.napo.io,etcd1.internal.napo.io,etcd2.internal.napo.io,etcd3.internal.napo.io,${ELB_DNS},127.0.0.1,kubernetes.default \
  -profile=kubernetes \
  kubernetes-csr.json | cfssljson -bare kubernetes

Distribute the Client and Server Certificates

Now with everything in place, we scp all certificates to the node instances.

ℹ️ The below commands only work if you have created ~/.ssh/config entries as stated at the beginning!

Workers

for worker in worker1 worker2 worker3; do
  scp ca.pem ${worker}-key.pem ${worker}.pem ${worker}:~/
done

Masters/Controllers

for master in master1 master2 master3; do
  scp ca.pem ca-key.pem kubernetes-key.pem kubernetes.pem ${master}:~/
done

etcd

for etcd in etcd1 etcd2 etcd3; do
  scp ca.pem ca-key.pem kubernetes-key.pem kubernetes.pem ${etcd}:~/
done

Generating Kubernetes Authentication Files for Authentication

Here we generate the files needed for authentication in Kubernetes.

Client Authentication Configs

Kubernetes Public IP Address

Get the Load Balancer’s DNS name:

KUBERNETES_PUBLIC_ADDRESS=$(aws elb describe-load-balancers --query 'LoadBalancerDescriptions[].DNSName' --output text)
kubelet Kubernetes Configuration Files

Generate configuration files for kubelet:

for i in 1 2 3; do
  instance="worker${i}"
  instance_hostname="worker${i}.internal.napo.io"
  kubectl config set-cluster kubernetes-the-real-hard-way \
    --certificate-authority=ca.pem \
    --embed-certs=true \
    --server=https://${KUBERNETES_PUBLIC_ADDRESS}:6443 \
    --kubeconfig=${instance}.kubeconfig

  kubectl config set-credentials system:node:${instance_hostname} \
    --client-certificate=${instance}.pem \
    --client-key=${instance}-key.pem \
    --embed-certs=true \
    --kubeconfig=${instance}.kubeconfig

  kubectl config set-context default \
    --cluster=kubernetes-the-real-hard-way \
    --user=system:node:${instance_hostname} \
    --kubeconfig=${instance}.kubeconfig

  kubectl config use-context default \
    --kubeconfig=${instance}.kubeconfig
done
The kube-proxy Kubernetes Configuration File
kubectl config set-cluster kubernetes-the-real-hard-way \
  --certificate-authority=ca.pem \
  --embed-certs=true \
  --server=https://${KUBERNETES_PUBLIC_ADDRESS}:6443 \
  --kubeconfig=kube-proxy.kubeconfig
kubectl config set-credentials kube-proxy \
  --client-certificate=kube-proxy.pem \
  --client-key=kube-proxy-key.pem \
  --embed-certs=true \
  --kubeconfig=kube-proxy.kubeconfig
kubectl config set-context default \
  --cluster=kubernetes-the-real-hard-way \
  --user=kube-proxy \
  --kubeconfig=kube-proxy.kubeconfig
kubectl config use-context default \
  --kubeconfig=kube-proxy.kubeconfig

Distribute the Kubernetes Configuration Files

And now transfer the configuration files to the worker nodes:

for worker in worker1 worker2 worker3; do
  scp ${worker}.kubeconfig kube-proxy.kubeconfig ${worker}:~/
done

Generating the Data Encryption Config and Key

For encryption we create a secure encryption key and config.

The Encryption Key

ENCRYPTION_KEY=$(head -c 32 /dev/urandom | base64)

The Encryption Config File

cat > encryption-config.yaml <<EOF
kind: EncryptionConfig
apiVersion: v1
resources:
  - resources:
      - secrets
    providers:
      - aescbc:
          keys:
            - name: key1
              secret: ${ENCRYPTION_KEY}
      - identity: {}
EOF

and transfer the encryption config file to the master nodes:

for master in master1 master2 master3; do
  scp encryption-config.yaml ${master}:~/
done

Bootstrapping the etcd Cluster

Now it is time to bootstrap our etcd cluster, which is our highly available key-value store for the Kubernetes API.

SSH to etcd1 etcd2 etcd3 via tmux multiplexer:

Execute on each etcd:

export INTERNAL_IP=$(curl http://169.254.169.254/1.0/meta-data/local-ipv4)

Install etcd:

wget -q --show-progress --https-only --timestamping \
  "https://github.com/etcd-io/etcd/releases/download/v3.4.3/etcd-v3.4.3-linux-amd64.tar.gz"
{
  tar -xvf etcd-v3.4.3-linux-amd64.tar.gz
  sudo mv etcd-v3.4.3-linux-amd64/etcd* /usr/local/bin/
}
{
  sudo mkdir -p /etc/etcd /var/lib/etcd
  sudo cp ca.pem kubernetes-key.pem kubernetes.pem /etc/etcd/
}

Get etcd nodes internal IPv4 addresses:

for i in 1 2 3; do export ETCD${i}_INTERNAL=$(dig +short etcd${i}.internal.test.ventx.de); done

Generate the etcd systemd unit file:

cat > etcd.service <<EOF
[Unit]
Description=etcd
Documentation=https://github.com/coreos

[Service]
ExecStart=/usr/local/bin/etcd \\
  --name ${HOSTNAME} \\
  --cert-file=/etc/etcd/kubernetes.pem \\
  --key-file=/etc/etcd/kubernetes-key.pem \\
  --peer-cert-file=/etc/etcd/kubernetes.pem \\
  --peer-key-file=/etc/etcd/kubernetes-key.pem \\
  --trusted-ca-file=/etc/etcd/ca.pem \\
  --peer-trusted-ca-file=/etc/etcd/ca.pem \\
  --peer-client-cert-auth \\
  --client-cert-auth \\
  --initial-advertise-peer-urls https://${INTERNAL_IP}:2380 \\
  --listen-peer-urls https://${INTERNAL_IP}:2380 \\
  --listen-client-urls https://${INTERNAL_IP}:2379,http://127.0.0.1:2379 \\
  --advertise-client-urls https://${INTERNAL_IP}:2379 \\
  --initial-cluster-token etcd-cluster-0 \\
  --initial-cluster etcd1.internal.napo.io=https://${ETCD1_INTERNAL}:2380,etcd2.internal.napo.io=https://${ETCD2_INTERNAL}:2380,etcd3.internal.napo.io=https://${ETCD3_INTERNAL}:2380 \\
  --initial-cluster-state new \\
  --data-dir=/var/lib/etcd
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

Move the files to the right place, reload systemd and enable + start the etcd service:

sudo mv etcd.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable etcd
sudo systemctl start etcd

Check if etcds works

Check for any errors in systemd:

systemctl status etcd

List etcd members:

ETCDCTL_API=3 etcdctl member list

Bootstrapping the Kubernetes Control Plane

Now it is time to bootstrap our Kubernetes Master Nodes (the “Control Plane” nodes).

SSH to master1, master2, master3 via tmux-multiplexer:

Get the latest stable Kubernetes version (currently 1.16.1 as of this writing):

KUBERNETES_STABLE=$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)

Generate the Kubernetes config directory, download kube components and move them to /usr/local/bin:

sudo mkdir -p /etc/kubernetes/config

wget -q --show-progress --https-only --timestamping \
  "https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_STABLE}/bin/linux/amd64/kube-apiserver" \
  "https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_STABLE}/bin/linux/amd64/kube-controller-manager" \
  "https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_STABLE}/bin/linux/amd64/kube-scheduler" \
  "https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_STABLE}/bin/linux/amd64/kubectl"
chmod +x kube-apiserver kube-controller-manager kube-scheduler kubectl
sudo mv kube-apiserver kube-controller-manager kube-scheduler kubectl /usr/local/bin/

Create directory for certificate files:

sudo mkdir -p /var/lib/kubernetes/
sudo mv ca.pem ca-key.pem kubernetes-key.pem kubernetes.pem encryption-config.yaml /var/lib/kubernetes/

Get internal IPv4 address via EC2 metadata link-local service:

INTERNAL_IP=$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4)

and for all the other etcd nodes via dig:

for i in 1 2 3; do export ETCD${i}_INTERNAL=$(dig +short etcd${i}.internal.test.ventx.de); done

Create kube-apiserver systemd file:

cat > kube-apiserver.service <<EOF
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/kubernetes/kubernetes

[Service]
ExecStart=/usr/local/bin/kube-apiserver \\
  --enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,PersistentVolumeClaimResize,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota \\
  --advertise-address=${INTERNAL_IP} \\
  --allow-privileged=true \\
  --apiserver-count=3 \\
  --audit-log-maxage=30 \\
  --audit-log-maxbackup=3 \\
  --audit-log-maxsize=100 \\
  --audit-log-path=/var/log/audit.log \\
  --authorization-mode=Node,RBAC \\
  --bind-address=0.0.0.0 \\
  --client-ca-file=/var/lib/kubernetes/ca.pem \\
  --etcd-cafile=/var/lib/kubernetes/ca.pem \\
  --etcd-certfile=/var/lib/kubernetes/kubernetes.pem \\
  --etcd-keyfile=/var/lib/kubernetes/kubernetes-key.pem \\
  --etcd-servers=https://${ETCD1_INTERNAL}:2379,https://${ETCD2_INTERNAL}:2379,https://${ETCD3_INTERNAL}:2379 \\
  --event-ttl=1h \\
  --encryption-provider-config=/var/lib/kubernetes/encryption-config.yaml \\
  --insecure-bind-address=127.0.0.1 \\
  --kubelet-certificate-authority=/var/lib/kubernetes/ca.pem \\
  --kubelet-client-certificate=/var/lib/kubernetes/kubernetes.pem \\
  --kubelet-client-key=/var/lib/kubernetes/kubernetes-key.pem \\
  --kubelet-https=true \\
  --runtime-config=api/all \\
  --service-account-key-file=/var/lib/kubernetes/ca-key.pem \\
  --service-cluster-ip-range=10.32.0.0/24 \\
  --service-node-port-range=30000-32767 \\
  --tls-cert-file=/var/lib/kubernetes/kubernetes.pem \\
  --tls-private-key-file=/var/lib/kubernetes/kubernetes-key.pem \\
  --v=5
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

Create kube-controller-manager systemd file:

cat > kube-controller-manager.service <<EOF
[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/kubernetes/kubernetes

[Service]
ExecStart=/usr/local/bin/kube-controller-manager \\
  --address=0.0.0.0 \\
  --cluster-cidr=10.200.0.0/16 \\
  --cluster-name=kubernetes \\
  --cluster-signing-cert-file=/var/lib/kubernetes/ca.pem \\
  --cluster-signing-key-file=/var/lib/kubernetes/ca-key.pem \\
  --leader-elect=true \\
  --master=http://127.0.0.1:8080 \\
  --root-ca-file=/var/lib/kubernetes/ca.pem \\
  --service-account-private-key-file=/var/lib/kubernetes/ca-key.pem \\
  --service-cluster-ip-range=10.32.0.0/24 \\
  --v=2
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

Create kube-scheduler systemd file:

cat > kube-scheduler.service <<EOF
[Unit]
Description=Kubernetes Scheduler
Documentation=https://github.com/kubernetes/kubernetes

[Service]
ExecStart=/usr/local/bin/kube-scheduler \\
  --leader-elect=true \\
  --master=http://127.0.0.1:8080 \\
  --v=2
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

Move the files to the right place, reload systemd and enable + start the kube-* services:

sudo mv kube-apiserver.service kube-scheduler.service kube-controller-manager.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable kube-apiserver kube-controller-manager kube-scheduler
sudo systemctl start kube-apiserver kube-controller-manager kube-scheduler

Verify that everything works

Sadly, kubectl get componenstatus (short: kubectl get cs) is somehow deprecated and does not work correctly with Kubernetes 1.16 - the tables get mixed up: https://github.com/kubernetes/kubernetes/issues/83024 )

But we can check with increased verbosity if everything is healthy:

kubectl get cs -v=8

Additionally we check for errors via systemd:

systemctl status kube-apiserver
systemctl status kube-controller-manager
systemctl status kube-scheduler

RBAC for Kubelet Authorization

The Role-based access control (RBAC) is the authz concept for Kubernetes. We need to create the ClusterRole and its ClusterRoleBinding for kubelet.

SSH to master1:

cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: system:kube-apiserver-to-kubelet
rules:
  - apiGroups:
      - ""
    resources:
      - nodes/proxy
      - nodes/stats
      - nodes/log
      - nodes/spec
      - nodes/metrics
    verbs:
      - "*"
EOF

cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: system:kube-apiserver
  namespace: ""
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:kube-apiserver-to-kubelet
subjects:
  - apiGroup: rbac.authorization.k8s.io
    kind: User
    name: kubernetes
EOF

Bootstrapping the Kubernetes Worker Nodes

Now we create the Worker Nodes who run the Pods in our Kubernetes Cluster.

Provisioning Kubernetes Worker Nodes

SSH to worker1, worker2, worker3 via tmux multiplexer:

Install the OS dependencies:

KUBERNETES_STABLE=$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)

sudo apt-get update
sudo apt-get -y install socat

Download & Install Worker Binaries

First download the worker binaries:

wget -q --show-progress --https-only --timestamping \
  https://github.com/containernetworking/plugins/releases/download/v0.8.2/cni-plugins-linux-amd64-v0.8.2.tgz \
  https://github.com/containerd/containerd/releases/download/v1.3.0/containerd-1.3.0.linux-amd64.tar.gz \
  https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_STABLE}/bin/linux/amd64/kubectl \
  https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_STABLE}/bin/linux/amd64/kube-proxy \
  https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_STABLE}/bin/linux/amd64/kubelet

Create the installation directories:

sudo mkdir -p \
  /etc/cni/net.d \
  /opt/cni/bin \
  /var/lib/kubelet \
  /var/lib/kube-proxy \
  /var/lib/kubernetes \
  /var/run/kubernetes

and finally install the worker binaries:

sudo tar -xvf cni-plugins-linux-amd64-v0.8.2.tgz -C /opt/cni/bin/
sudo tar -xvf containerd-1.3.0.linux-amd64.tar.gz -C /
chmod +x kubectl kube-proxy kubelet
sudo mv kubectl kube-proxy kubelet /usr/local/bin/

Configure CNI Networking

Retrieve the Pod CIDR range environment variable we set via user-data during instance creation (done in Terraform/CDK IaC code ):

echo $POD_CIDR

Create the bridge network configuration file:

cat <<EOF | sudo tee /etc/cni/net.d/10-bridge.conf
{
    "cniVersion": "0.3.1",
    "name": "bridge",
    "type": "bridge",
    "bridge": "cnio0",
    "isGateway": true,
    "ipMasq": true,
    "ipam": {
        "type": "host-local",
        "ranges": [
          [{"subnet": "${POD_CIDR}"}]
        ],
        "routes": [{"dst": "0.0.0.0/0"}]
    }
}
EOF

Create the loopback network configuration file:

cat <<EOF | sudo tee /etc/cni/net.d/99-loopback.conf
{
    "cniVersion": "0.3.1",
    "type": "loopback"
}
EOF

Configure containerd

Install runc, a CLI tool for spawning and running containers according to the OCI runtime specification.

sudo apt-get install runc -y

Create the containerd configuration file:

sudo mkdir -p /etc/containerd/

cat << EOF | sudo tee /etc/containerd/config.toml
[plugins]
  [plugins.cri.containerd]
    snapshotter = "overlayfs"
    [plugins.cri.containerd.default_runtime]
      runtime_type = "io.containerd.runtime.v1.linux"
      runtime_engine = "/usr/sbin/runc"
      runtime_root = ""
    [plugins.cri.containerd.untrusted_workload_runtime]
      runtime_type = "io.containerd.runtime.v1.linux"
      runtime_engine = "/usr/sbin/runsc"
      runtime_root = "/run/containerd/runsc"
EOF

ℹ️ INFO: Untrusted workloads will be run using the gVisor (runsc) container runtime sandbox.

Create the containerd.service systemd file:

cat <<EOF | sudo tee /etc/systemd/system/containerd.service
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target

[Service]
ExecStartPre=/sbin/modprobe overlay
ExecStart=/bin/containerd
Restart=always
RestartSec=5
Delegate=yes
KillMode=process
OOMScoreAdjust=-999
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity

[Install]
WantedBy=multi-user.target
EOF

Configure the Kubelet

Moving the config files to the right directory and create kubelet configs.

{
  sudo mv ${HOSTNAME}-key.pem ${HOSTNAME}.pem /var/lib/kubelet/
  sudo mv ${HOSTNAME}.kubeconfig /var/lib/kubelet/kubeconfig
  sudo mv ca.pem /var/lib/kubernetes/
}

Create the kubelet-config.yaml configuration file:

cat <<EOF | sudo tee /var/lib/kubelet/kubelet-config.yaml
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    enabled: true
  x509:
    clientCAFile: "/var/lib/kubernetes/ca.pem"
authorization:
  mode: Webhook
clusterDomain: "cluster.local"
clusterDNS:
  - "10.32.0.10"
podCIDR: "${POD_CIDR}"
runtimeRequestTimeout: "15m"
tlsCertFile: "/var/lib/kubelet/${HOSTNAME}.pem"
tlsPrivateKeyFile: "/var/lib/kubelet/${HOSTNAME}-key.pem"
EOF

Create the kubelet.service systemd unit file:

cat <<EOF | sudo tee /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/kubernetes/kubernetes
After=containerd.service
Requires=containerd.service

[Service]
ExecStart=/usr/local/bin/kubelet \\
  --config=/var/lib/kubelet/kubelet-config.yaml \\
  --container-runtime=remote \\
  --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock \\
  --image-pull-progress-deadline=2m \\
  --kubeconfig=/var/lib/kubelet/kubeconfig \\
  --network-plugin=cni \\
  --register-node=true \\
  --v=2
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

Configure the Kubernetes Proxy

And finally the kube-proxy configuration.

sudo mv kube-proxy.kubeconfig /var/lib/kube-proxy/kubeconfig

Create the kube-proxy-config.yaml configuration file:

cat <<EOF | sudo tee /var/lib/kube-proxy/kube-proxy-config.yaml
kind: KubeProxyConfiguration
apiVersion: kubeproxy.config.k8s.io/v1alpha1
clientConnection:
  kubeconfig: "/var/lib/kube-proxy/kubeconfig"
mode: "iptables"
clusterCIDR: "10.200.0.0/16"
EOF

Create the kube-proxy.service systemd unit file:

cat <<EOF | sudo tee /etc/systemd/system/kube-proxy.service
[Unit]
Description=Kubernetes Kube Proxy
Documentation=https://github.com/kubernetes/kubernetes

[Service]
ExecStart=/usr/local/bin/kube-proxy \\
  --config=/var/lib/kube-proxy/kube-proxy-config.yaml
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

Start the Worker Services

Reload systemd and enable + start the containerd, kubelet and kube-proxy services:

{
  sudo systemctl daemon-reload
  sudo systemctl enable containerd kubelet kube-proxy
  sudo systemctl start containerd kubelet kube-proxy
}

Verification

Copy admin.kubeconfig to master servers:

for master in master1 master2 master3; do
  scp admin.kubeconfig ${master}:~/
done

Connect to master servers via tmux multiplexer and get worker nodes:

kubectl get nodes --kubeconfig admin.kubeconfig

Configuring kubectl for Remote Access

We want to access the Kubernetes Cluster from our workstation via kubectl. So on your workstation (where you created the certificate and config files) execute the following steps for remote kubectl access.

The Admin Kubernetes Configuration File

We need to configure a Kubernetes API server endpoint where we want to connect to. For High Availability we created a Load Balancer who fronts the Kubernetes Master Servers (kube-apiservers). The LBs DNS name is our external endpoint for remote access.

Generate the kubeconfig file suitable for authenticating as admin user:

KUBERNETES_PUBLIC_ADDRESS=$(aws elb describe-load-balancers --query 'LoadBalancerDescriptions[].DNSName' --output text)
{
  kubectl config set-cluster kubernetes-the-real-hard-way \
    --certificate-authority=ca.pem \
    --embed-certs=true \
    --server=https://${KUBERNETES_PUBLIC_ADDRESS}:6443

  kubectl config set-credentials admin \
    --client-certificate=admin.pem \
    --client-key=admin-key.pem

  kubectl config set-context kubernetes-the-real-hard-way \
    --cluster=kubernetes-the-real-hard-way \
    --user=admin

  kubectl config use-context kubernetes-the-real-hard-way
}

Verify everything works from your workstation:

kubectl get nodes

Provisioning Pod Network Routes

Pods scheduled to a node receive an IP address from the node’s Pod CIDR range. At this point pods can not communicate with other pods running on different nodes due to missing network routes.

Every pod who gets scheduled to a node receives an IPv4 address from the POD_CIDR range of this node. With our IaC we exported the EnvVar via UserData in the 10.200.x.0/24 ranges.

Now we have to create the routes in the AWS Route Tables for each Worker Node that maps the Node’s POD_CIDR range to the Node’s internal IPv4 address.

ℹ️ This way we to not have to install any additional CNI.

Of course we could install e.g. Flannel or use some other way for archieving Kubernetes networking.

SSH to every worker and get the POD_CIDR environment variable:

for worker in worker1 worker2 worker3; do dig +short ${worker}.internal.napo.io && ssh ${worker} -q 'echo $POD_CIDR'; done

Routes

Create network routes for each worker instance via [aws-cli|(https://aws.amazon.com/cli/):

ROUTE_TABLE_ID_0=$(aws ec2 describe-route-tables \
  --filters "Name=tag:Name,Values=cdk-python-k8s-right-way-aws/k8s-hard-way-vpc/PublicSubnet1" | \
  jq -r '.RouteTables[].RouteTableId')

ROUTE_TABLE_ID_1=$(aws ec2 describe-route-tables \
  --filters "Name=tag:Name,Values=cdk-python-k8s-right-way-aws/k8s-hard-way-vpc/PublicSubnet2" | \
  jq -r '.RouteTables[].RouteTableId')

ROUTE_TABLE_ID_2=$(aws ec2 describe-route-tables \
  --filters "Name=tag:Name,Values=cdk-python-k8s-right-way-aws/k8s-hard-way-vpc/PublicSubnet3" | \
  jq -r '.RouteTables[].RouteTableId')


WORKER_0_INSTANCE_ID=$(aws ec2 describe-instances \
  --filters "Name=tag:Name,Values=cdk-python-k8s-right-way-aws/worker1" | \
  jq -j '.Reservations[].Instances[].InstanceId')

WORKER_1_INSTANCE_ID=$(aws ec2 describe-instances \
  --filters "Name=tag:Name,Values=cdk-python-k8s-right-way-aws/worker2" | \
  jq -j '.Reservations[].Instances[].InstanceId')

WORKER_2_INSTANCE_ID=$(aws ec2 describe-instances \
  --filters "Name=tag:Name,Values=cdk-python-k8s-right-way-aws/worker3" | \
  jq -j '.Reservations[].Instances[].InstanceId')


aws ec2 create-route \
  --route-table-id ${ROUTE_TABLE_ID_0} \
  --destination-cidr-block 10.200.0.0/24 \
  --instance-id ${WORKER_0_INSTANCE_ID}

aws ec2 create-route \
  --route-table-id ${ROUTE_TABLE_ID_1} \
  --destination-cidr-block 10.200.0.0/24 \
  --instance-id ${WORKER_0_INSTANCE_ID}

aws ec2 create-route \
  --route-table-id ${ROUTE_TABLE_ID_2} \
  --destination-cidr-block 10.200.0.0/24 \
  --instance-id ${WORKER_0_INSTANCE_ID}



aws ec2 create-route \
  --route-table-id ${ROUTE_TABLE_ID_0} \
  --destination-cidr-block 10.200.1.0/24 \
  --instance-id ${WORKER_1_INSTANCE_ID}

aws ec2 create-route \
  --route-table-id ${ROUTE_TABLE_ID_1} \
  --destination-cidr-block 10.200.1.0/24 \
  --instance-id ${WORKER_1_INSTANCE_ID}

aws ec2 create-route \
  --route-table-id ${ROUTE_TABLE_ID_2} \
  --destination-cidr-block 10.200.1.0/24 \
  --instance-id ${WORKER_1_INSTANCE_ID}



aws ec2 create-route \
  --route-table-id ${ROUTE_TABLE_ID_0} \
  --destination-cidr-block 10.200.2.0/24 \
  --instance-id ${WORKER_2_INSTANCE_ID}

aws ec2 create-route \
  --route-table-id ${ROUTE_TABLE_ID_1} \
  --destination-cidr-block 10.200.2.0/24 \
  --instance-id ${WORKER_2_INSTANCE_ID}

aws ec2 create-route \
  --route-table-id ${ROUTE_TABLE_ID_2} \
  --destination-cidr-block 10.200.2.0/24 \
  --instance-id ${WORKER_2_INSTANCE_ID}

List routes in VPC:

aws ec2 describe-route-tables --route-table-ids ${ROUTE_TABLE_ID_0} | \
  jq -j '.RouteTables[].Routes[] | .DestinationCidrBlock, " ", .NetworkInterfaceId // .GatewayId, " ", .State, "\n"' 
aws ec2 describe-route-tables --route-table-ids ${ROUTE_TABLE_ID_1} | \
  jq -j '.RouteTables[].Routes[] | .DestinationCidrBlock, " ", .NetworkInterfaceId // .GatewayId, " ", .State, "\n"' 
aws ec2 describe-route-tables --route-table-ids ${ROUTE_TABLE_ID_2} | \
  jq -j '.RouteTables[].Routes[] | .DestinationCidrBlock, " ", .NetworkInterfaceId // .GatewayId, " ", .State, "\n"'     

Deploy DNS Cluster Add-on

As a last step we want to have a working DNS add-on which provides DNS based service discovery to all applications running inside our Kubernetes cluster.

The DNS Cluster Add-on

Create kube-dns.yaml file (working with Kubernetes v1.16):

# Copyright 2016 The Kubernetes Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: v1
kind: Service
metadata:
  name: kube-dns
  namespace: kube-system
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/name: "KubeDNS"
spec:
  selector:
    k8s-app: kube-dns
  clusterIP: 10.32.0.10
  ports:
  - name: dns
    port: 53
    protocol: UDP
  - name: dns-tcp
    port: 53
    protocol: TCP
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-dns
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-dns
  namespace: kube-system
  labels:
    addonmanager.kubernetes.io/mode: EnsureExists
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kube-dns
  namespace: kube-system
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
spec:
  # replicas: not specified here:
  # 1. In order to make Addon Manager do not reconcile this replicas parameter.
  # 2. Default is 1.
  # 3. Will be tuned in real time if DNS horizontal auto-scaling is turned on.
  strategy:
    rollingUpdate:
      maxSurge: 10%
      maxUnavailable: 0
  selector:
    matchLabels:
      k8s-app: kube-dns
  template:
    metadata:
      labels:
        k8s-app: kube-dns
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      tolerations:
      - key: "CriticalAddonsOnly"
        operator: "Exists"
      volumes:
      - name: kube-dns-config
        configMap:
          name: kube-dns
          optional: true
      containers:
      - name: kubedns
        image: gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.7
        resources:
          # TODO: Set memory limits when we've profiled the container for large
          # clusters, then set request = limit to keep this container in
          # guaranteed class. Currently, this container falls into the
          # "burstable" category so the kubelet doesn't backoff from restarting it.
          limits:
            memory: 170Mi
          requests:
            cpu: 100m
            memory: 70Mi
        livenessProbe:
          httpGet:
            path: /healthcheck/kubedns
            port: 10054
            scheme: HTTP
          initialDelaySeconds: 60
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 5
        readinessProbe:
          httpGet:
            path: /readiness
            port: 8081
            scheme: HTTP
          # we poll on pod startup for the Kubernetes master service and
          # only setup the /readiness HTTP server once that's available.
          initialDelaySeconds: 3
          timeoutSeconds: 5
        args:
        - --domain=cluster.local.
        - --dns-port=10053
        - --config-dir=/kube-dns-config
        - --v=2
        env:
        - name: PROMETHEUS_PORT
          value: "10055"
        ports:
        - containerPort: 10053
          name: dns-local
          protocol: UDP
        - containerPort: 10053
          name: dns-tcp-local
          protocol: TCP
        - containerPort: 10055
          name: metrics
          protocol: TCP
        volumeMounts:
        - name: kube-dns-config
          mountPath: /kube-dns-config
      - name: dnsmasq
        image: gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.7
        livenessProbe:
          httpGet:
            path: /healthcheck/dnsmasq
            port: 10054
            scheme: HTTP
          initialDelaySeconds: 60
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 5
        args:
        - -v=2
        - -logtostderr
        - -configDir=/etc/k8s/dns/dnsmasq-nanny
        - -restartDnsmasq=true
        - --
        - -k
        - --cache-size=1000
        - --no-negcache
        - --log-facility=-
        - --server=/cluster.local/127.0.0.1#10053
        - --server=/in-addr.arpa/127.0.0.1#10053
        - --server=/ip6.arpa/127.0.0.1#10053
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        # see: https://github.com/kubernetes/kubernetes/issues/29055 for details
        resources:
          requests:
            cpu: 150m
            memory: 20Mi
        volumeMounts:
        - name: kube-dns-config
          mountPath: /etc/k8s/dns/dnsmasq-nanny
      - name: sidecar
        image: gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.7
        livenessProbe:
          httpGet:
            path: /metrics
            port: 10054
            scheme: HTTP
          initialDelaySeconds: 60
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 5
        args:
        - --v=2
        - --logtostderr
        - --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,SRV
        - --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,SRV
        ports:
        - containerPort: 10054
          name: metrics
          protocol: TCP
        resources:
          requests:
            memory: 20Mi
            cpu: 10m
      dnsPolicy: Default  # Don't use cluster DNS.
      serviceAccountName: kube-dns

Deploy kube-dns to the cluster:

kubectl create -f kube-dns.yaml

Final Words

Now have some fun with your custom Kubernetes cluster and deploy some workloads on it. Or we can further enhance the cluster with an extra Ingress service (nginx-ingress / aws-alb-ingress)

If you encounter any problems or have some ideas on how to enhance the IaC code ➡️ please let me know!

I’m very happy to see some Pull Requests on GitHub for the Terraform and CDK python code of this blog post: