Kubernetes The (real) Hard Way on AWS

Posted on Nov 18, 2019

Table of contents:

Works with Kubernetes 1.16 !

INFRASTRUCTURE Terraform K8s The Right Hard Way Infrastructure Diagram

For my preparation to the Cloud Native Computing Foundation - Certified Kubernetes Administrator exam (or CNCF CKA for short), it is important to get the Ins and Outs of creating Kubernetes clusters by hand. This includes generating all the certificates, systemd unit files, K8s configs and the installation of components.

You should already have some basic knowledge about Kubernetes in general.

Most of you may have already seen Kelsey Hightower’s fantastic “Kubernetes The Hard Way” tutorial on GitHub. There is even a LinuxAcademy course and forks of the tutorial for Bare Metal installations (GitHub / Medium). And of course there are already several guides for AWS (here and here).

So why write another one? Simple answer: there is not a single all-in-one guide for an AWS and Multi-Master non-stacked Kubernetes setup which I looked for. Besides, it was a good use-case to do some first baby steps with the AWS CDK for Python :)

If you haven’t already checked out the CDK, give it a try! => https://aws.amazon.com/cdk

So what is this all about?

Executive Summary: it creates the infrastructure and components for a multi-node non-stacked Kubernetes Cluster (v1.16!) on AWS - but doin’ it the real hard way!

Features

CDK code available (Python3, generates CloudFormation)
Terraform code available (>= 0.12 / HCL2)
Multi Master HA Kubernetes control plane
Non-Stacked setup (etcd servers are running on their own instances)
10x EC2 instances
- 1x Bastion Host (single-node ASG)
- 3x etcd nodes (ASG)
- 3x Kubernetes Master nodes (ASG)
- 3x Kubernetes Worker nodes (ASG)
3x Load Balancers
- Bastion Host LB: for safe access to Bastion Host instance
- K8s-master-public-LB: external kubectl access
- K8s-master-private-LB: fronts Kubernetes API Servers
Route53 records for all EC2 instances’ internal IPv4 addresses (ease of use)
External access via Bastion Host (SSH) & public Load Balancer (kubectl) only
Access to BastionLB & MasterPublicLB from your workstation IP only (by default)
Strict SecurityGroups

Infrastructure as Code

Terraform: GitHub Logo

hajowieland/terraform-k8s-the-real-hard-way-aws

AWS CDK (Python, CloudFormation): GitHub Logo hajowieland/cdk-py-k8s-the-real-hard-way-aws

Create Infrastructure

⚡ The infrastructure created with Terraform and/or CDK may not be for production usage. But it is safe to use, it creates AutoScalingGroups (with LaunchConfigurations), a Bastion Host, Public/Private LoadBalancers and has tightened SecurityGroups assigned to all resources.

First we need to create our infrastructure and you can use either the AWS CDK or Terraform repository above. Both will create the same infrastructure with ten EC2 instances by default for a fully non-stacked Kubernetes setup. This means the etcd nodes are running on their own instances and not on top of the K8s Master nodes (= stacked-setup).

If you change the number of nodes, you have to adapt the below instructions accordingly.

In the IaC, set the Route53 Hosted Zone you want to use (Terraform: var.hosted_zone / CDK: zone_fqdn). This will create an A record for the Bastion host like this:

bastion.example.com

It also provisions the Bastion Host with a User Data script to install the cfssl binary (by CloudFlare) for easy creation of all the CSRs, certificates and keys.

SecurityGroups

Overview of SecurityGroups:

Component	Source	Protocol	Port	Description
Bastion	Bastion LB	TCP	22	SSH from Bastion LB
Bastion LB	Workstation	TCP	22	SSH from Workstation
etcd	Bastion	TCP	22	SSH from Bastion
etcd	K8s-Master	TCP	2379	etcd-client
etcd	K8s-Master	TCP	2380	etcd-server
K8s-PublicLB	Workstation	TCP	6443	kubectl from Workstation
K8s-PrivateLB	Masters	TCP	6443	kube-api from Masters
K8s-PrivateLB	Workers	TCP	6443	kube-api from Workers
K8s-Master	Bastion	TCP	22	SSH from Bastion
K8s-Master	MasterPublicLB	TCP	6443	kubectl from Public LB
K8s-Master	MasterPrivateB	TCP	6443	kube-api from Private LB
K8s-Master	Bastion	TCP	6443	kubectl access from Bastion
K8s-Master	K8s-Worker	any	any	Allow any from K8s-Worker
K8s-Worker	Bastion	TCP	22	SSH from Bastion
K8s-Worker	Bastion	TCP	6443	kubectl access from Bastion
K8s-Worker	K8s-Master	any	any	Allow any from K8s-Master

Load Balancers

And finally we will have these three Classic Elastic Load Balancers (CLB):

K8s-Master Public ELB (for remote kube-apiserver / kubectl access from your workstation)
K8s-Master Private ELB (fronts kube-apiservers)
Bastion ELB (for secure SSH access into the Private Subnets)

Terraform

To apply the infrastructure with Terraform >=0.12, just clone my repository and create a terraform.tfvars file with your configuration:

git clone git@github.com:hajowieland/terraform-k8s-the-real-hard-way-aws.git

Requirements

Terraform 0.12 (macOS: brew install terraform)
awscli with default profile configured

Required Variables

You have to change the values of at least these two TF variables in your terraform.tfvars file:

TF Variable	Description	Type	Default	Example
owner	Your Name	string	`napo.io`	`Max Mustermann`
hosted_zone	Route53 Hosted Zone Name for DNS records	string	`""`	`test.example.com`

There are more variables to configure, but most of them have sane default values.

SSH KEY: If you do not specify a pre-exisintg AWS Key Pair with var.aws_key_pair_name, then TF creates a new one with your ~/.ssh/id_rsa key by default
You change this path by setting var.ssh_public_key_path

Deploy the infrastructure with Terraform:

terraform init
terraform plan
terraform apply

CDK

To apply the infrastructure with the AWS CDK, just clone my repository and edit the cdk_python_k8s_right_way_aws_stack.py file accordingly:

git clone git@github.com:hajowieland/cdk-py-k8s-the-real-hard-way-aws.git

Requirements

cdk installed via npm install -g cdk
awscli with default profile configured
Existing AWS Key Pair
Python3
In repository => VirtualEnv: virtualenv .env -p python3 and then source .env/bin/active
Install pip3 requirements: pip3 install -r requirements.txt

Required Variables

You have to change the values of at least these two variables in your cdk_python_k8s_right_way_aws_stack.py file:

variable	Description	Type	Default	Example
ssh_key_pair	Existing AWS Key Pair	`''`	`id_rsa`	`MyKeyPairName`
tag_owner	Your Name	string	`napo.io`	`Holy Kubernetus`
zone_fqdn	Route53 Hosted Zone Name	string	`''`	`test.example.com`

There are more variables to configure, most of them have sane default values.

Deploy the infrastructure with CDK:

cdk synth # outputs the rendered CloudFormation code
cdk deploy

BE AWARE:

If you change the default value for project in CDK (tag_project) or Terraform (var.project) then you have to adapt the filters in all following commands using aws-cli !

Defaults

By default all EC2 instances are created in us-east-1 AWS region.

Connect to Bastion Host

Now everything will take place on the Bastion host.

Connect via ssh, whether with the AWS Key Pair name you configured or, when using Terraform, if you have created a new AWS Key Pair (CDK/CloudFormation does not support creating Key Pairs):

ssh ec2-user@bastion.napo.io

It can take a few moments until all resources are ready

Get the internal IPs

On the Bastion Host, we first need to know the IPv4 addresses for all EC2 instances we created (the internal IPs - all nodes are running in private subnets).

In the UserData we already set some global environment variables to make your life easier, so you do not have to set them every time:

# Already set during infrastructure deployment
# Verify the values:
# echo $HOSTEDZONE_NAME && echo $AWS_DEFAULT_REGION
export HOSTEDZONE_NAME=napo.io # << from var.hosted_zone / zone_fqdn
export AWS_DEFAULT_REGION=$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/document|grep region|awk -F\" '{print $4}') # << AWS Region the EC2 instance is running for use with awscli

Before we continue with the next steps, get the Hosted Zone ID and export it as environment variable:

# Gets HostedZone ID from its name
export HOSTEDZONE_ID=$(aws route53 list-hosted-zones-by-name --dns-name $HOSTEDZONE_NAME --query 'HostedZones[].Id' --output text | cut -d/ -f3)

The next Shell commands help us identify the EC2 instances and create Route53 records for ease of use:

During a for loop, we get all Instances by quering the AutoScalingGroups with a specific LaunchConfiguration prefix
Tag the EC2 instance with incrementing number
Get the private IP address of the EC2 instance
Create a Route53 Recordset JSON file with heredoc (see example below)
Create Route53 record with the private IP address and component name + incremented number

The temporary JSON files for the Route53 records look like this (just an example):

{
	"Comment":"Create/Update etcd A record",
	"Changes":[{
		"Action":"UPSERT",
		"ResourceRecordSet":{
			"Name":"etcd$i.$ZONENAME",
			"Type":"A",
			"TTL":30,
			"ResourceRecords":[{
				"Value":"$IP"
			}]
		}
	}]
}

⚠️ Be aware that you have do manually delete these Route53 records when you’re finished.

Now execute the following shell commands on the Bastion Host:

Etcd:

i=1
for INSTANCE in $(aws autoscaling describe-auto-scaling-instances --query 'AutoScalingInstances[?starts_with(LaunchConfigurationName, `etcd`)].[InstanceId]' --output text); do
  aws ec2 create-tags --resources $INSTANCE --tags Key=Name,Value=etcd$i
  IP=$(aws ec2 describe-instances --instance-id $INSTANCE --query 'Reservations[].Instances[].[PrivateIpAddress]' --output text)
  cat << EOF > /tmp/record.json
{
"Comment":"Create/Update etcd A record",
"Changes":[{
  "Action":"UPSERT",
  "ResourceRecordSet":{
    "Name":"etcd$i.internal.$HOSTEDZONE_NAME",
    "Type":"A",
    "TTL":30,
    "ResourceRecords":[{
      "Value":"$IP"
    }]
  }
}]
}
EOF
  aws route53 change-resource-record-sets --hosted-zone-id $HOSTEDZONE_ID --change-batch file:///tmp/record.json
  export ETCD${i}_INTERNAL=$IP
  i=$((i+1))
done

Master:

i=1
for INSTANCE in $(aws autoscaling describe-auto-scaling-instances --query 'AutoScalingInstances[?starts_with(LaunchConfigurationName, `master`)].[InstanceId]' --output text); do
  aws ec2 create-tags --resources $INSTANCE --tags Key=Name,Value=master$i
  IP=$(aws ec2 describe-instances --instance-id $INSTANCE --query 'Reservations[].Instances[].[PrivateIpAddress]' --output text)
  cat << EOF > /tmp/record.json
{
"Comment":"Create/Update master A record",
"Changes":[{
  "Action":"UPSERT",
  "ResourceRecordSet":{
    "Name":"master$i.internal.$HOSTEDZONE_NAME",
    "Type":"A",
    "TTL":30,
    "ResourceRecords":[{
      "Value":"$IP"
    }]
  }
}]
}
EOF
  aws route53 change-resource-record-sets --hosted-zone-id $HOSTEDZONE_ID --change-batch file:///tmp/record.json
  export MASTER${i}_INTERNAL=$IP
  i=$((i+1))
done

Worker:

i=1
for INSTANCE in $(aws autoscaling describe-auto-scaling-instances --query 'AutoScalingInstances[?starts_with(LaunchConfigurationName, `worker`)].[InstanceId]' --output text); do
  aws ec2 create-tags --resources $INSTANCE --tags Key=Name,Value=worker$i
  IP=$(aws ec2 describe-instances --instance-id $INSTANCE --query 'Reservations[].Instances[].[PrivateIpAddress]' --output text)
  cat << EOF > /tmp/record.json
{
"Comment":"Create/Update worker A record",
"Changes":[{
  "Action":"UPSERT",
  "ResourceRecordSet":{
    "Name":"worker$i.internal.$HOSTEDZONE_NAME",
    "Type":"A",
    "TTL":30,
    "ResourceRecords":[{
      "Value":"$IP"
    }]
  }
}]
}
EOF
  aws route53 change-resource-record-sets --hosted-zone-id $HOSTEDZONE_ID --change-batch file:///tmp/record.json
  export WORKER${i}_INTERNAL=$IP
  i=$((i+1))
done

Setup SSH config

This step really makes your life easier and in the following steps we will use the names configured in the SSH client config file.

ℹ️ Replace the IdentityFile (your OpenSSH key used as AWS EC2 Key Pair) and HostName Domain accordingly - do not forget to add your SSH private key to the Bastion Host (e.g. place it at $HOME/.ssh/id_rsa).

# On your workstation (macOS - and if your private key is id_rsa)
cat ~/.ssh/id_rsa | pbcopy

~/.ssh/id_rsa:

# On Bastion
vi ~/.ssh/id_rsa
chmod 400 ~/.ssh/id_rsa

On the Bastion Host, open the SSH config file and adapt to your setup (replace napo.io with your domain):

~/.ssh/config:

Host etcd1 etcd2 etcd3 master1 master2 master3 worker1 worker2 worker3
  User ubuntu
  HostName %h.internal.napo.io
  IdentityFile ~/.ssh/id_rsa

chmod 600 ~/.ssh/config

Create Kubernetes Cluster the (real) hard way

With the preparation done, we now start doing Kubernetes The (real) Hard Way on AWS 🥳

Notes

For parallel execution on multiple instances at once, use tmux and this multiplexer script: https://gist.github.com/dmytro/3984680.

Both are already installed on the Bastion Host (the multiplexer script can be found in ec2-user $HOME directory).

Later, use tmux and execute tmux-multi.sh script to run commands on multiple instances (all etcd/master/worker nodes) at once.

But first we are creating the certificates on the Bastion Host and transfer them to the instances.

Create certificates

Now lets start creating all the stuff needed for Kubernetes (CA, Signing Requests, Certs, Keys etc.)

Certificate Authority

For our Certificate Authority (Lifetime 17520h => 2 years) we create a CA config file and Certificate Signing Requests (CSRs) for the 4096-bit RSA key:

cat > ca-config.json <<EOF
{
  "signing": {
    "default": {
      "expiry": "17520h"
    },
    "profiles": {
      "kubernetes": {
        "usages": ["signing", "key encipherment", "server auth", "client auth"],
        "expiry": "17520h"
      }
    }
  }
}
EOF

cat > ca-csr.json <<EOF
{
  "CN": "Kubernetes",
  "key": {
    "algo": "rsa",
    "size": 4096
  },
  "names": [
    {
      "C": "DE",
      "L": "Munich",
      "O": "Kubernetes",
      "OU": "Kubernetes The Real Hard Way",
      "ST": "$HOSTEDZONE_NAME"
    }
  ]
}
EOF

Generate the Certificate Authority key from the previous CA config and CSR:

cfssl gencert -initca ca-csr.json | cfssljson -bare ca

Client and Server Certificates

Now create client and server certificates with their corresponding CSRs:

Admin Client Certificate

cat > admin-csr.json <<EOF
{
  "CN": "admin",
  "key": {
    "algo": "rsa",
    "size": 4096
  },
  "names": [
    {
      "C": "DE",
      "L": "Munich",
      "O": "system:masters",
      "OU": "Kubernetes The Real Hard Way",
      "ST": "$HOSTEDZONE_NAME"
    }
  ]
}
EOF

Generate Admin client key:

cfssl gencert \
  -ca=ca.pem \
  -ca-key=ca-key.pem \
  -config=ca-config.json \
  -profile=kubernetes \
  admin-csr.json | cfssljson -bare admin

Kubelet Client Certificates

Here we get all the Worker nodes, identify them via their LaunchConfiguration name and create the CSRs:

WORKERCOUNT=$(aws autoscaling describe-auto-scaling-instances --query 'AutoScalingInstances[?starts_with(LaunchConfigurationName, `worker`)].[InstanceId]' --output text | wc -l)
i=1
while [ "$i" -le "$WORKERCOUNT" ]; do
  cat > worker${i}-csr.json <<EOF
  {
    "CN": "system:node:worker${i}.internal.${HOSTEDZONE_NAME}",
    "key": {
      "algo": "rsa",
      "size": 4096
    },
    "names": [
      {
        "C": "DE",
        "L": "Munich",
        "O": "system:nodes",
        "OU": "Kubernetes The Real Hard Way",
        "ST": "${HOSTEDZONE_NAME}"
      }
    ]
  }
EOF
  i=$(($i + 1))
done

Create the keys for all Worker nodes:

i=1
while [ "$i" -le "$WORKERCOUNT" ]; do
  cfssl gencert \
    -ca=ca.pem \
    -ca-key=ca-key.pem \
    -config=ca-config.json \
    -hostname=worker${i}.internal.${HOSTEDZONE_NAME} \
    -profile=kubernetes \
    worker${i}-csr.json | cfssljson -bare worker${i}
  i=$(($i + 1))
done

kube-controller-manager Certificate

Generate the kube-controller-manager client certificate and private key:

cat > kube-controller-manager-csr.json <<EOF
{
  "CN": "system:kube-controller-manager",
  "key": {
    "algo": "rsa",
    "size": 4096
  },
  "names": [
    {
      "C": "DE",
      "L": "Munich",
      "O": "system:kube-controller-manager",
      "OU": "Kubernetes The Real Hard Way",
      "ST": "${HOSTEDZONE_NAME}"
    }
  ]
}
EOF

cfssl gencert \
  -ca=ca.pem \
  -ca-key=ca-key.pem \
  -config=ca-config.json \
  -profile=kubernetes \
  kube-controller-manager-csr.json | cfssljson -bare kube-controller-manager

kube-proxy Client Certificate

Now create everything needed for the kube-proxy component.

First, again, the CSR:

cat > kube-proxy-csr.json <<EOF
{
  "CN": "system:kube-proxy",
  "key": {
    "algo": "rsa",
    "size": 4096
  },
  "names": [
    {
      "C": "DE",
      "L": "Munich",
      "O": "system:node-proxier",
      "OU": "Kubernetes The Real Hard Way",
      "ST": "$HOSTEDZONE_NAME"
    }
  ]
}
EOF

… and then generate the key for kube-proxy:

cfssl gencert \
  -ca=ca.pem \
  -ca-key=ca-key.pem \
  -config=ca-config.json \
  -profile=kubernetes \
  kube-proxy-csr.json | cfssljson -bare kube-proxy

kube-scheduler Client Certificate

Generate the kube-scheduler client certificate and private key:

cat > kube-scheduler-csr.json <<EOF
{
  "CN": "system:kube-scheduler",
  "key": {
    "algo": "rsa",
    "size": 4096
  },
  "names": [
    {
      "C": "DE",
      "L": "Munich",
      "O": "system:kube-scheduler",
      "OU": "Kubernetes The Real Hard Way",
      "ST": "$HOSTEDZONE_NAME"
    }
  ]
}
EOF

cfssl gencert \
  -ca=ca.pem \
  -ca-key=ca-key.pem \
  -config=ca-config.json \
  -profile=kubernetes \
  kube-scheduler-csr.json | cfssljson -bare kube-scheduler

kube-controller-manager ServiceAccount Token

To sign ServiceAccount tokens by the kube-controller-manager (see Documentation), create the certificate and private key:

cat > service-account-csr.json <<EOF
{
  "CN": "service-accounts",
  "key": {
    "algo": "rsa",
    "size": 4096
  },
  "names": [
    {
      "C": "DE",
      "L": "Munich",
      "O": "Kubernetes",
      "OU": "Kubernetes The Real Hard Way",
      "ST": "$HOSTEDZONE_NAME"
    }
  ]
}
EOF

cfssl gencert \
  -ca=ca.pem \
  -ca-key=ca-key.pem \
  -config=ca-config.json \
  -profile=kubernetes \
  service-account-csr.json | cfssljson -bare service-account

Kubernetes API Server Certificate

And finally the CSR for kube-apiserver:

cat > kubernetes-csr.json <<EOF
{
  "CN": "kubernetes",
  "key": {
    "algo": "rsa",
    "size": 4096
  },
  "names": [
    {
      "C": "DE",
      "L": "Munich",
      "O": "Kubernetes",
      "OU": "Kubernetes The Real Hard Way",
      "ST": "$HOSTEDZONE_NAME"
    }
  ]
}
EOF

For generating the kube-apiserver keys, we need to define all the IPs which will access the Apiserver.

ℹ️ NOTE: 10.32.0.1 ==> kubernetes.default.svc.cluster.local.
https://github.com/kelseyhightower/kubernetes-the-hard-way/issues/105

First get the Kubernetes Master ELBs DNS names via their prefixes and assign them to envvars:

MASTER_ELB_PRIVATE=$(aws elb describe-load-balancers --query 'LoadBalancerDescriptions[? starts_with(DNSName, `internal-master`)]| [].DNSName' --output text)
MASTER_ELB_PUBLIC=$(aws elb describe-load-balancers --query 'LoadBalancerDescriptions[? starts_with(DNSName, `master`)]| [].DNSName' --output text)

Generate the API Server certificate key but adapt the number of etcd, worker and master nodes to your setup (here by default three nodes each).

We use the EnvVars we created earlier:

cfssl gencert \
  -ca=ca.pem \
  -ca-key=ca-key.pem \
  -config=ca-config.json \
  -hostname=10.32.0.1,${ETCD1_INTERNAL},\
    ${ETCD2_INTERNAL},${ETCD3_INTERNAL},\
    ${MASTER1_INTERNAL},${MASTER2_INTERNAL},\
    ${MASTER3_INTERNAL},${WORKER1_INTERNAL},\
    ${WORKER2_INTERNAL},${WORKER3_INTERNAL},\
    etcd1.internal.${HOSTEDZONE_NAME},\
    etcd2.internal.${HOSTEDZONE_NAME},\
    etcd3.internal.${HOSTEDZONE_NAME},\
    master1.internal.${HOSTEDZONE_NAME},\
    master2.internal.${HOSTEDZONE_NAME},\
    master3.internal.${HOSTEDZONE_NAME},\
    worker1.internal.${HOSTEDZONE_NAME},\
    worker2.internal.${HOSTEDZONE_NAME},\
    worker3.internal.${HOSTEDZONE_NAME},\
    ${MASTER_ELB_PRIVATE},${MASTER_ELB_PUBLIC},\
    127.0.0.1,kubernetes.default \
  -profile=kubernetes \
  kubernetes-csr.json | cfssljson -bare kubernetes

Distribute the Client and Server Certificates

Now with everything in place, we scp all certificates to the instances.

ℹ️ The below commands only work if you have created ~/.ssh/config entries as stated at the beginning!

If you have changed the default, adapt the number of etcd/master/worker nodes to match your setup

etcd

for etcd in etcd1 etcd2 etcd3; do
  scp ca.pem ca-key.pem kubernetes-key.pem kubernetes.pem ${etcd}:~/
done

Masters/Controllers

for master in master1 master2 master3; do
  scp ca.pem ca-key.pem kubernetes-key.pem \
  kubernetes.pem service-account-key.pem service-account.pem \
  ${master}:~/
done

Workers

for worker in worker1 worker2 worker3; do
  scp ca.pem ${worker}-key.pem ${worker}.pem ${worker}:~/
done

Generating Kubernetes Authentication Files for Authentication

Now in this step we generate the files needed for authentication in Kubernetes.

Client Authentication Configs

kubelet Kubernetes Configuration Files

Generate kubeconfig configuration files for kubelet of every worker:

for i in 1 2 3; do
  instance="worker${i}"
  instance_hostname="worker${i}.internal.$HOSTEDZONE_NAME"
  kubectl config set-cluster kubernetes-the-real-hard-way \
    --certificate-authority=ca.pem \
    --embed-certs=true \
    --server=https://${MASTER_ELB_PRIVATE}:6443 \
    --kubeconfig=${instance}.kubeconfig

  kubectl config set-credentials system:node:${instance_hostname} \
    --client-certificate=${instance}.pem \
    --client-key=${instance}-key.pem \
    --embed-certs=true \
    --kubeconfig=${instance}.kubeconfig

  kubectl config set-context default \
    --cluster=kubernetes-the-real-hard-way \
    --user=system:node:${instance_hostname} \
    --kubeconfig=${instance}.kubeconfig

  kubectl config use-context default \
    --kubeconfig=${instance}.kubeconfig
done

The kube-proxy Kubernetes Configuration File

Generate the kube-proxy kubeconfig:

kubectl config set-cluster kubernetes-the-real-hard-way \
  --certificate-authority=ca.pem \
  --embed-certs=true \
  --server=https://${MASTER_ELB_PRIVATE}:6443 \
  --kubeconfig=kube-proxy.kubeconfig
kubectl config set-credentials kube-proxy \
  --client-certificate=kube-proxy.pem \
  --client-key=kube-proxy-key.pem \
  --embed-certs=true \
  --kubeconfig=kube-proxy.kubeconfig
kubectl config set-context default \
  --cluster=kubernetes-the-real-hard-way \
  --user=kube-proxy \
  --kubeconfig=kube-proxy.kubeconfig
kubectl config use-context default \
  --kubeconfig=kube-proxy.kubeconfig

The kube-controller-manager Kubernetes Configuration File

Generate the kube-controller-manager kubeconfig:

kubectl config set-cluster kubernetes-the-real-hard-way \
  --certificate-authority=ca.pem \
  --embed-certs=true \
  --server=https://127.0.0.1:6443 \
  --kubeconfig=kube-controller-manager.kubeconfig
kubectl config set-credentials system:kube-controller-manager \
  --client-certificate=kube-controller-manager.pem \
  --client-key=kube-controller-manager-key.pem \
  --embed-certs=true \
  --kubeconfig=kube-controller-manager.kubeconfig
kubectl config set-context default \
  --cluster=kubernetes-the-real-hard-way \
  --user=system:kube-controller-manager \
  --kubeconfig=kube-controller-manager.kubeconfig
kubectl config use-context default --kubeconfig=kube-controller-manager.kubeconfig

The kube-scheduler Kubernetes Configuration File

Generate the kubeconfig file for the kube-scheduler component:

kubectl config set-cluster kubernetes-the-real-hard-way \
  --certificate-authority=ca.pem \
  --embed-certs=true \
  --server=https://127.0.0.1:6443 \
  --kubeconfig=kube-scheduler.kubeconfig
kubectl config set-credentials system:kube-scheduler \
  --client-certificate=kube-scheduler.pem \
  --client-key=kube-scheduler-key.pem \
  --embed-certs=true \
  --kubeconfig=kube-scheduler.kubeconfig
kubectl config set-context default \
  --cluster=kubernetes-the-real-hard-way \
  --user=system:kube-scheduler \
  --kubeconfig=kube-scheduler.kubeconfig
kubectl config use-context default --kubeconfig=kube-scheduler.kubeconfig

The admin Kubernetes Configuration File

And finally, the kubeconfig file for our admin user (that’s you 🙂):

kubectl config set-cluster kubernetes-the-real-hard-way \
  --certificate-authority=ca.pem \
  --embed-certs=true \
  --server=https://127.0.0.1:6443 \
  --kubeconfig=admin.kubeconfig
kubectl config set-credentials admin \
  --client-certificate=admin.pem \
  --client-key=admin-key.pem \
  --embed-certs=true \
  --kubeconfig=admin.kubeconfig
kubectl config set-context default \
  --cluster=kubernetes-the-real-hard-way \
  --user=admin \
  --kubeconfig=admin.kubeconfig
kubectl config use-context default --kubeconfig=admin.kubeconfig

Distribute the Kubernetes Configuration Files

Now transfer the kubelet & kube-proxy kubeconfig files to the worker nodes:

for worker in worker1 worker2 worker3; do
  scp ${worker}.kubeconfig kube-proxy.kubeconfig ${worker}:~/
done

And then the admin, kube-controller-manager & kube-scheduler kubeconfig files to the master nodes:

for master in master1 master2 master3; do
  scp admin.kubeconfig \
  kube-controller-manager.kubeconfig \
  kube-scheduler.kubeconfig \
  ${master}:~/
done

Generating the Data Encryption Config and Key

For encryption we first create a secure encryption key and then the EncryptionConfiguration.

The Encryption Key

ENCRYPTION_KEY=$(head -c 32 /dev/urandom | base64)

The Encryption Config File

cat > encryption-config.yaml <<EOF
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
  - resources:
      - secrets
    providers:
      - aescbc:
          keys:
            - name: key1
              secret: ${ENCRYPTION_KEY}
      - identity: {}
EOF

Transfer the encryption config file to the master nodes:

for master in master1 master2 master3; do
  scp encryption-config.yaml ${master}:~/
done

Bootstrapping the etcd Cluster

Now it is time to bootstrap our etcd cluster, which is our highly available key-value store for the Kubernetes API.

Think of it as the Kube API’s persistent storage for saving the state of all resources.

Now it is time to use the power of tmux and the multiplexer script:

Start tmux
Execute $HOME/tmux-multi.sh
Enter etcd1 etcd2 etcd3 (or more, according to your setup and how you configured your SSH config at the beginning)

Now we can execute the following commands in parallel on each etcd node.

First we get the etcdhost internal IPv4 address and set the hostname:

export ETCDHOST=$(aws ec2 describe-tags --filters "Name=resource-id,Values=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)" "Name=key,Values=Name" --output=text | cut -f 5)
sudo hostnamectl set-hostname --static $ETCDHOST.internal.$HOSTEDZONE_NAME
echo "$INTERNAL_IP $ETCDHOST.internal.$HOSTEDZONE_NAME" | sudo tee -a /etc/hosts

Install etcd and move the files:

wget -q --show-progress --https-only --timestamping \
  "https://github.com/etcd-io/etcd/releases/download/v3.4.3/etcd-v3.4.3-linux-amd64.tar.gz"
{
  tar -xvf etcd-v3.4.3-linux-amd64.tar.gz
  sudo mv etcd-v3.4.3-linux-amd64/etcd* /usr/local/bin/
}
{
  sudo mkdir -p /etc/etcd /var/lib/etcd
  sudo cp ca.pem kubernetes-key.pem kubernetes.pem /etc/etcd/
}

Get etcd nodes IPv4 addresses end export them as envvars

Again: by default you have three etcds - adapt to your setup if necessary.

for i in 1 2 3; do export ETCD${i}_INTERNAL=$(dig +short etcd${i}.internal.${HOSTEDZONE_NAME}); done

Generate the etcd systemd unit file:

cat > etcd.service <<EOF
[Unit]
Description=etcd
Documentation=https://github.com/coreos

[Service]
ExecStart=/usr/local/bin/etcd \\
  --name ${ETCDHOST}.internal.${HOSTEDZONE_NAME} \\
  --cert-file=/etc/etcd/kubernetes.pem \\
  --key-file=/etc/etcd/kubernetes-key.pem \\
  --peer-cert-file=/etc/etcd/kubernetes.pem \\
  --peer-key-file=/etc/etcd/kubernetes-key.pem \\
  --trusted-ca-file=/etc/etcd/ca.pem \\
  --peer-trusted-ca-file=/etc/etcd/ca.pem \\
  --peer-client-cert-auth \\
  --client-cert-auth \\
  --initial-advertise-peer-urls https://${INTERNAL_IP}:2380 \\
  --listen-peer-urls https://${INTERNAL_IP}:2380 \\
  --listen-client-urls https://${INTERNAL_IP}:2379,http://127.0.0.1:2379 \\
  --advertise-client-urls https://${INTERNAL_IP}:2379 \\
  --initial-cluster-token etcd-cluster-0 \\
  --initial-cluster etcd1.internal.${HOSTEDZONE_NAME}=https://${ETCD1_INTERNAL}:2380,etcd2.internal.${HOSTEDZONE_NAME}=https://${ETCD2_INTERNAL}:2380,etcd3.internal.${HOSTEDZONE_NAME}=https://${ETCD3_INTERNAL}:2380 \\
  --initial-cluster-state new \\
  --data-dir=/var/lib/etcd
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

Move the files to the right place, reload systemd and enable + start the etcd service:

sudo mv etcd.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable etcd
sudo systemctl start etcd

Check if etcds works

Check for any errors in systemd:

systemctl status etcd

List etcd members:

ETCDCTL_API=3 etcdctl member list

The output should look like this

OUTPUT:

2d2d6426a2ba46f2, started, etcd3.internal.napo.io, https://10.23.1.109:2380, https://10.23.1.109:2379, false
7e1b60cbd871ed2f, started, etcd1.internal.napo.io, https://10.23.3.168:2380, https://10.23.3.168:2379, false
a879f686f293ea99, started, etcd2.internal.napo.io, https://10.23.2.33:2380, https://10.23.2.33:2379, false

Debug

⚠️ If you somehow messed up your etcd, start the key-value store from scratch like this (Reference: https://github.com/etcd-io/etcd/issues/10101 )

ETCDCTL_API=3 etcdctl del "" --from-key=true
sudo systemctl stop etcd
sudo rm -rf /var/lib/etcd/default.etcd
sudo systemctl start etcd

Bootstrapping the Kubernetes Control Plane

Now that we have our working etcd cluster, it is time to bootstrap our Kubernetes Master Nodes.

exit the tmux-multiplexer on etcds so that you’re back on the Bastion Host.
Now again execute $HOME/tmux-multi.sh and type in the master nodes:

SSH to master1 master2 master3 via tmux multiplexer and execute in parallel on each master node.

First we get the masterhost internal IPv4 address and set the hostname:

export MASTERHOST=$(aws ec2 describe-tags --filters "Name=resource-id,Values=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)" "Name=key,Values=Name" --output=text | cut -f 5)
sudo hostnamectl set-hostname --static $MASTERHOST.internal.$HOSTEDZONE_NAME
echo "$INTERNAL_IP $MASTERHOST.internal.$HOSTEDZONE_NAME" | sudo tee -a /etc/hosts

Get the latest stable Kubernetes version (currently 1.16.3 as of this writing):

KUBERNETES_STABLE=$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)
echo $KUBERNETES_STABLE

Generate the Kubernetes config directory, download kube components and move them to /usr/local/bin:

sudo mkdir -p /etc/kubernetes/config

wget -q --show-progress --https-only --timestamping \
  "https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_STABLE}/bin/linux/amd64/kube-apiserver" \
  "https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_STABLE}/bin/linux/amd64/kube-controller-manager" \
  "https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_STABLE}/bin/linux/amd64/kube-scheduler" \
  "https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_STABLE}/bin/linux/amd64/kubectl"
chmod +x kube-apiserver kube-controller-manager kube-scheduler kubectl
sudo mv kube-apiserver kube-controller-manager kube-scheduler kubectl /usr/local/bin/

Create the directory for certificates, keys and encryption config and move them there:

sudo mkdir -p /var/lib/kubernetes/
sudo mv ca.pem ca-key.pem kubernetes-key.pem kubernetes.pem encryption-config.yaml /var/lib/kubernetes/

Get the etcd nodes IPv4 addresses for the systemd unit file generation:

for i in 1 2 3; do export ETCD${i}_INTERNAL=$(dig +short etcd${i}.internal.${HOSTEDZONE_NAME}); done

Create the kube-apiserver systemd file.

Here all the fun takes place: options and parameters for kube-apiserver.

You can find the current documentation of all options here: https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/

cat > kube-apiserver.service <<EOF
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/kubernetes/kubernetes

[Service]
ExecStart=/usr/local/bin/kube-apiserver \\
  --enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,PersistentVolumeClaimResize,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota \\
  --advertise-address=${INTERNAL_IP} \\
  --allow-privileged=true \\
  --apiserver-count=3 \\
  --audit-log-maxage=30 \\
  --audit-log-maxbackup=3 \\
  --audit-log-maxsize=100 \\
  --audit-log-path=/var/log/audit.log \\
  --authorization-mode=Node,RBAC \\
  --bind-address=0.0.0.0 \\
  --client-ca-file=/var/lib/kubernetes/ca.pem \\
  --etcd-cafile=/var/lib/kubernetes/ca.pem \\
  --etcd-certfile=/var/lib/kubernetes/kubernetes.pem \\
  --etcd-keyfile=/var/lib/kubernetes/kubernetes-key.pem \\
  --etcd-servers=https://${ETCD1_INTERNAL}:2379,https://${ETCD2_INTERNAL}:2379,https://${ETCD3_INTERNAL}:2379 \\
  --event-ttl=1h \\
  --encryption-provider-config=/var/lib/kubernetes/encryption-config.yaml \\
  --insecure-bind-address=127.0.0.1 \\
  --kubelet-certificate-authority=/var/lib/kubernetes/ca.pem \\
  --kubelet-client-certificate=/var/lib/kubernetes/kubernetes.pem \\
  --kubelet-client-key=/var/lib/kubernetes/kubernetes-key.pem \\
  --kubelet-https=true \\
  --runtime-config=api/all \\
  --service-account-key-file=/var/lib/kubernetes/ca-key.pem \\
  --service-cluster-ip-range=10.32.0.0/24 \\
  --service-node-port-range=30000-32767 \\
  --tls-cert-file=/var/lib/kubernetes/kubernetes.pem \\
  --tls-private-key-file=/var/lib/kubernetes/kubernetes-key.pem \\
  --v=5
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

Move kube-controller-manager kubeconfig to Kubernetes directory:

sudo mv kube-controller-manager.kubeconfig /var/lib/kubernetes/

Create kube-controller-manager systemd unit file:

cat > kube-controller-manager.service <<EOF
[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/kubernetes/kubernetes

[Service]
ExecStart=/usr/local/bin/kube-controller-manager \\
  --address=0.0.0.0 \\
  --cluster-cidr=10.200.0.0/16 \\
  --cluster-name=kubernetes \\
  --cluster-signing-cert-file=/var/lib/kubernetes/ca.pem \\
  --cluster-signing-key-file=/var/lib/kubernetes/ca-key.pem \\
  --leader-elect=true \\
  --master=http://127.0.0.1:8080 \\
  --root-ca-file=/var/lib/kubernetes/ca.pem \\
  --service-account-private-key-file=/var/lib/kubernetes/ca-key.pem \\
  --service-cluster-ip-range=10.32.0.0/24 \\
  --v=2
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

Move kube-scheduler kubeconfig to Kubernetes directory:

sudo mv kube-scheduler.kubeconfig /var/lib/kubernetes/

Create the KubeSchedulerConfiguration config:

cat <<EOF | sudo tee /etc/kubernetes/config/kube-scheduler.yaml
apiVersion: componentconfig/v1alpha1
kind: KubeSchedulerConfiguration
clientConnection:
  kubeconfig: "/var/lib/kubernetes/kube-scheduler.kubeconfig"
leaderElection:
  leaderElect: true
EOF

Create the kube-scheduler systemd unit file:

cat > kube-scheduler.service <<EOF
[Unit]
Description=Kubernetes Scheduler
Documentation=https://github.com/kubernetes/kubernetes

[Service]
ExecStart=/usr/local/bin/kube-scheduler \\
  --leader-elect=true \\
  --master=http://127.0.0.1:8080 \\
  --v=2
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

Move the files to the right place, reload systemd and enable + start the kube-* services:

sudo mv kube-apiserver.service kube-scheduler.service kube-controller-manager.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable kube-apiserver kube-controller-manager kube-scheduler
sudo systemctl start kube-apiserver kube-controller-manager kube-scheduler

Verify that everything works

Sadly, kubectl get componenstatus (short: kubectl get cs) is somehow deprecated and does not work correctly with Kubernetes 1.16 - the tables get mixed up: https://github.com/kubernetes/kubernetes/issues/83024

But we can check with increased verbosity if everything is healthy:

# Gives lots of output
# Use the curl commands below for health checking
kubectl get cs -v=8

Additionally we check for errors via systemd:

systemctl status kube-apiserver
systemctl status kube-controller-manager
systemctl status kube-scheduler

curl the healthz health check endpoint (you should get a HTTP 200 back):

curl --cacert /var/lib/kubernetes/ca.pem \
  --key /var/lib/kubernetes/kubernetes-key.pem \
  --cert /var/lib/kubernetes/kubernetes.pem \
  -i https://127.0.0.1:6443/healthz

If you’re curious, you can check the version info, too:

curl --cacert /var/lib/kubernetes/ca.pem \
  --key /var/lib/kubernetes/kubernetes-key.pem \
  --cert /var/lib/kubernetes/kubernetes.pem \
  -i https://127.0.0.1:6443/version

If everything looks good we can now move on to RBAC.

RBAC for Kubelet Authorization

The Role-based access control (RBAC) is the authz concept for Kubernetes. We need to create the ClusterRole and its ClusterRoleBinding for kubelet on the worker nodes.

Exit the tmux multiplexer and SSH to the first Master instance (master1):

ssh master1

Create and apply a ClusterRole for kubelet (Worker) to kube-apiserver (Master) authorization:

cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: system:kube-apiserver-to-kubelet
rules:
  - apiGroups:
      - ""
    resources:
      - nodes/proxy
      - nodes/stats
      - nodes/log
      - nodes/spec
      - nodes/metrics
    verbs:
      - "*"
EOF

Create and apply the corresponding ClusterRoleBinding for the above ClusterRole:

cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: system:kube-apiserver
  namespace: ""
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:kube-apiserver-to-kubelet
subjects:
  - apiGroup: rbac.authorization.k8s.io
    kind: User
    name: kubernetes
EOF

Bootstrapping the Kubernetes Worker Nodes

Now we create the Worker Nodes who run the Pods in our cluster. They do all the heavy work and run most of the user software which we deploy on Kubernetes.

Provisioning Kubernetes Worker Nodes

SSH to worker1 worker2 worker3 via tmux multiplexer and execute in parallel on each worker node.

First we get the workerhost internal IPv4 address and set the hostname:

export WORKERHOST=$(aws ec2 describe-tags --filters "Name=resource-id,Values=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)" "Name=key,Values=Name" --output=text | cut -f 5)
sudo hostnamectl set-hostname --static $WORKERHOST.internal.$HOSTEDZONE_NAME
echo "$INTERNAL_IP $WORKERHOST.internal.$HOSTEDZONE_NAME" | sudo tee -a /etc/hosts

Get current Kubernetes stable version:

KUBERNETES_STABLE=$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)

Install the OS dependencies on Ubuntu via apt-get:

sudo apt-get update
sudo apt-get -y install socat conntrack ipset

Download & Install Worker Binaries

Download the CNI Plugins and worker binaries (kubelet, kube-proxy, kubectl):

wget -q --show-progress --https-only --timestamping \
  https://github.com/containernetworking/plugins/releases/download/v0.8.2/cni-plugins-linux-amd64-v0.8.2.tgz \
  https://github.com/containerd/containerd/releases/download/v1.3.0/containerd-1.3.0.linux-amd64.tar.gz \
  https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_STABLE}/bin/linux/amd64/kubectl \
  https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_STABLE}/bin/linux/amd64/kube-proxy \
  https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_STABLE}/bin/linux/amd64/kubelet

Create the installation directories:

sudo mkdir -p \
  /etc/cni/net.d \
  /opt/cni/bin \
  /var/lib/kubelet \
  /var/lib/kube-proxy \
  /var/lib/kubernetes \
  /var/run/kubernetes

and finally extract and move the CNI plugins and binaries there:

sudo tar -xvf cni-plugins-linux-amd64-v0.8.2.tgz -C /opt/cni/bin/
sudo tar -xvf containerd-1.3.0.linux-amd64.tar.gz -C /
chmod +x kubectl kube-proxy kubelet
sudo mv kubectl kube-proxy kubelet /usr/local/bin/

Configure CNI Networking

Now we configure the CIDR ranges for the Pod network. This configures the network which the Pods on every Worker node use to communicate with eath other between nodes.

We configure the Kubernetes CNI here with bridge and loopback interfaces and add the routes in the AWS Route Tables later. We could of course use another overlay network CNI like flannel or Calico but for our “the hard way” setup it may be more to learn from by creating it ourself.

But please, play around with other CNIs later, get to know the Pros and Cons and when it makes sense to use one over the other (for example because of NetworkPolicies)

ℹ️ A little shady trick 🤯 I do in the IaC of every worker node’s UserData:
It generates a random number between 10-250 and exports the CIDR as environment variable POD_CIDR. This envvar is used now in the next command for creating the bridge config
Default Value of POD_CIDR: 10.200.$RANDOM_NUMBER.0/24)

echo $POD_CIDR

Create the bridge network configuration file:

cat <<EOF | sudo tee /etc/cni/net.d/10-bridge.conf
{
    "cniVersion": "0.3.1",
    "name": "bridge",
    "type": "bridge",
    "bridge": "cnio0",
    "isGateway": true,
    "ipMasq": true,
    "ipam": {
        "type": "host-local",
        "ranges": [
          [{"subnet": "${POD_CIDR}"}]
        ],
        "routes": [{"dst": "0.0.0.0/0"}]
    }
}
EOF

Create the loopback network configuration file:

cat <<EOF | sudo tee /etc/cni/net.d/99-loopback.conf
{
    "cniVersion": "0.3.1",
    "type": "loopback"
}
EOF

Configure containerd

Install runc, a CLI tool for spawning and running containers according to the OCI runtime specification.

sudo apt-get install runc -y

Create the containerd configuration TOML file:

sudo mkdir -p /etc/containerd/

cat << EOF | sudo tee /etc/containerd/config.toml
[plugins]
  [plugins.cri.containerd]
    snapshotter = "overlayfs"
    [plugins.cri.containerd.default_runtime]
      runtime_type = "io.containerd.runtime.v1.linux"
      runtime_engine = "/usr/sbin/runc"
      runtime_root = ""
    [plugins.cri.containerd.untrusted_workload_runtime]
      runtime_type = "io.containerd.runtime.v1.linux"
      runtime_engine = "/usr/sbin/runsc"
      runtime_root = "/run/containerd/runsc"
EOF

ℹ️ INFO: Untrusted workloads will be run using the gVisor (runsc) container runtime sandbox.

Create the containerd.service systemd unit file:

cat <<EOF | sudo tee /etc/systemd/system/containerd.service
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target

[Service]
ExecStartPre=/sbin/modprobe overlay
ExecStart=/bin/containerd
Restart=always
RestartSec=5
Delegate=yes
KillMode=process
OOMScoreAdjust=-999
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity

[Install]
WantedBy=multi-user.target
EOF

Configure the Kubelet

Now to the configuration of kubelet. Move the certs and keys to the right directory.

sudo mv $(hostname -s)-key.pem /var/lib/kubelet/
sudo mv $(hostname -s).pem /var/lib/kubelet/
sudo mv $(hostname -s).kubeconfig /var/lib/kubelet/kubeconfig
sudo mv ca.pem /var/lib/kubernetes/

Create a simple kubelet configuration file (KubeletConfiguration):

cat <<EOF | sudo tee /var/lib/kubelet/kubelet-config.yaml
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
authentication:
  anonymous:
    enabled: false
  webhook:
    enabled: true
  x509:
    clientCAFile: "/var/lib/kubernetes/ca.pem"
authorization:
  mode: Webhook
clusterDomain: "cluster.local"
clusterDNS:
  - "10.32.0.10"
podCIDR: "${POD_CIDR}"
runtimeRequestTimeout: "15m"
tlsCertFile: "/var/lib/kubelet/$(hostname -s).pem"
tlsPrivateKeyFile: "/var/lib/kubelet/$(hostname -s)-key.pem"
EOF

Create the kubelet.service systemd unit file:

cat <<EOF | sudo tee /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/kubernetes/kubernetes
After=containerd.service
Requires=containerd.service

[Service]
ExecStart=/usr/local/bin/kubelet \\
  --config=/var/lib/kubelet/kubelet-config.yaml \\
  --container-runtime=remote \\
  --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock \\
  --image-pull-progress-deadline=2m \\
  --kubeconfig=/var/lib/kubelet/kubeconfig \\
  --network-plugin=cni \\
  --register-node=true \\
  --v=2
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

Configure the Kubernetes Proxy

And finally the kube-proxy configuration.

Move the kube-proxy kubeconfig to the right directory:

sudo mv kube-proxy.kubeconfig /var/lib/kube-proxy/kubeconfig

Create the kube-proxy-config.yaml configuration file. Here we define the overall Cluster CIDR network range (10.200.0.0/16):

cat <<EOF | sudo tee /var/lib/kube-proxy/kube-proxy-config.yaml
kind: KubeProxyConfiguration
apiVersion: kubeproxy.config.k8s.io/v1alpha1
clientConnection:
  kubeconfig: "/var/lib/kube-proxy/kubeconfig"
mode: "iptables"
clusterCIDR: "10.200.0.0/16"
EOF

Create the kube-proxy.service systemd unit file:

cat <<EOF | sudo tee /etc/systemd/system/kube-proxy.service
[Unit]
Description=Kubernetes Kube Proxy
Documentation=https://github.com/kubernetes/kubernetes

[Service]
ExecStart=/usr/local/bin/kube-proxy \\
  --config=/var/lib/kube-proxy/kube-proxy-config.yaml
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

Start the Worker Services

Reload systemd and enable + start the containerd, kubelet and kube-proxy services:

sudo systemctl daemon-reload
sudo systemctl enable containerd kubelet kube-proxy
sudo systemctl start containerd kubelet kube-proxy

Check via systemd that there are no errros:

sudo systemctl status containerd
sudo systemctl status kubelet
sudo systemctl status kube-proxy

Verification

Exit the multiplexer and transfer the admin kubeconfig file to the first master node (master1):

Copy admin.kubeconfig to one master server:

scp admin.kubeconfig master1:~/

Connect to first master server via SSH and get the worker nodes via kubectl:

ssh master1 "kubectl get nodes --kubeconfig admin.kubeconfig"

OUTPUT:

NAME                             STATUS   ROLES    AGE   VERSION
worker1.internal.napo.io   Ready    <none>   90s   v1.16.3
worker2.internal.napo.io   Ready    <none>   90s   v1.16.3
worker3.internal.napo.io   Ready    <none>   90s   v1.16.3

Configuring kubectl for Remote Access

We want to access the Kubernetes Cluster with the kubectl commandline utility from our Bastion Host as well as from our local Workstation.

👍 The Bastion Host already has kubectl installed.

=> On your workstation, you can see here how to install kubectl for all Operating Systems.

The Admin Kubernetes Configuration File

We need to configure a Kubernetes API server endpoint where we want to connect to. For High Availability we created an internal Load Balancer who fronts the Kubernetes Master Servers (kube-apiservers). The other public Load Balancer’s DNS name is our external endpoint for remote access.

Internally, e.g. on the Bastion Host, we use our internal Load Balancer but for external access we use the public-facing one. This may sound unnecessary, but this way we can tighten the SecurityGroups even more.

Bastion Host / Internal access

Generate the kubeconfig file suitable for authenticating as admin user on the Bastion Host:

MASTER_ELB_PRIVATE=$(aws elb describe-load-balancers --query 'LoadBalancerDescriptions[? starts_with(DNSName, `internal-master`)]| [].DNSName' --output text)

kubectl config set-cluster kubernetes-the-real-hard-way \
  --certificate-authority=ca.pem \
  --embed-certs=true \
  --server=https://${MASTER_ELB_PRIVATE}:6443

kubectl config set-credentials admin \
  --client-certificate=admin.pem \
  --client-key=admin-key.pem

kubectl config set-context kubernetes-the-real-hard-way \
  --cluster=kubernetes-the-real-hard-way \
  --user=admin

kubectl config use-context kubernetes-the-real-hard-way

Verify everything works from Bastion Host:

kubectl get nodes

Workstation / Remote access

Copy the the admin client cert and key together with the CA cert from the Bastion Host to your local workstation:

~/ca.pem
~/admin.pem
~/admin-key.pem

Generate the kubeconfig file suitable for authenticating as admin user on your workstation.

you may have to set --region us-east-1 to the region where our infrastructure is running
you may have to edit the paths to the certs and key if they aren’t in the current directory

MASTER_ELB_PUBLIC=$(aws elb describe-load-balancers --query 'LoadBalancerDescriptions[? starts_with(DNSName, `master`)]| [].DNSName' --region us-east-1 --output text)

kubectl config set-cluster kubernetes-the-real-hard-way \
  --certificate-authority=ca.pem \
  --embed-certs=true \
  --server=https://${MASTER_ELB_PUBLIC}:6443

kubectl config set-credentials admin \
  --client-certificate=admin.pem \
  --client-key=admin-key.pem

kubectl config set-context kubernetes-the-real-hard-way \
  --cluster=kubernetes-the-real-hard-way \
  --user=admin

kubectl config use-context kubernetes-the-real-hard-way

Verify everything works from your Workstation:

kubectl get nodes

Hooray congratulations 🤗

Now we have safe remote access to our Kubernetes Cluster. But to really use it, we have to configure the Pod Network routes in the next step.

BE AWARE: If you workstation IP changes, you have to update the MasterPublicLB SecurityGroup to access Kubernetes!

Provisioning Pod Network Routes

Pods scheduled to a node receive an IP address from the node’s Pod CIDR range (POD_CIDR envvar). At this point pods can not communicate with other pods running on different nodes due to missing network routes.

Now its time to create the routes in each Worker Node’s AWS Route Table. This establishes a network route from the Node’s POD_CIDR to the Node’s internal IPv4 address.

ℹ️ This way we to not have to install any additional CNI.
Like mentioned before, we could use Flannel or some other way for archieving Kubernetes networking.

Routes

Connect back to the Bastion Host and create the network routes for each worker instance via aws-cli.

First get all private Route Tables and save them into the Bash Array ROUTE_TABLES:

ROUTE_TABLES=($(aws ec2 describe-route-tables --filters "Name=tag:Attribute,Values=private"  --query 'RouteTables[].Associations[].[RouteTableId]' --region us-east-1 --output text))

Then the next command connects to the Worker nodes via SSH and gets the value of the POD_CIDR envvar, saves it into the Bash Array WORKER_POD_CIDRS:

WORKER_POD_CIDRS=()
for i in 1 2 3; do
  WORKER_POD_CIDRS+=($(ssh worker$i 'echo $POD_CIDR'))
done

Now create the Routes for every worker node’s POD_CIDR to the node’s ENI (Elastic Network Interface):

for rt in ${ROUTE_TABLES[@]}; do
  i=1
  for cidr in ${WORKER_POD_CIDRS[@]}; do
    ENI_ID=$(aws ec2 describe-instances --filters "Name=tag:Name,Values=worker${i}" --query 'Reservations[].Instances[].NetworkInterfaces[].[NetworkInterfaceId]' --output text)
    echo "${rt}: ${cidr} => ${ENI_ID}"
    aws ec2 create-route \
      --route-table-id ${rt} \
      --destination-cidr-block ${cidr} \
      --network-interface-id ${ENI_ID}
    i=$((i+1))
  done
done

OUTPUT:

You should see Return:true for => (Number Workers) X (Private Route Tables) = 9 (by default)

rtb-093ea7f2ab5e6c2d6: 10.200.188.0/24 => eni-0f9e482a3d6ac5797
{
    "Return": true
}
rtb-093ea7f2ab5e6c2d6: 10.200.166.0/24 => eni-0487ae6ec86bbef5c
{
    "Return": true
}
rtb-093ea7f2ab5e6c2d6: 10.200.152.0/24 => eni-009f0deb164d3fafa
{
    "Return": true
}
rtb-00b8aae6926b2e250: 10.200.188.0/24 => eni-0f9e482a3d6ac5797
{
    "Return": true
}
rtb-00b8aae6926b2e250: 10.200.166.0/24 => eni-0487ae6ec86bbef5c
{
    "Return": true
}
rtb-00b8aae6926b2e250: 10.200.152.0/24 => eni-009f0deb164d3fafa
{
    "Return": true
}
rtb-03288ee836e727375: 10.200.188.0/24 => eni-0f9e482a3d6ac5797
{
    "Return": true
}
rtb-03288ee836e727375: 10.200.166.0/24 => eni-0487ae6ec86bbef5c
{
    "Return": true
}
rtb-03288ee836e727375: 10.200.152.0/24 => eni-009f0deb164d3fafa
{
    "Return": true
}

Verify the the routes:

for rt in ${ROUTE_TABLES[@]}; do
  aws ec2 describe-route-tables --route-table-ids ${rt} | \
    jq -j '.RouteTables[].Routes[] | .DestinationCidrBlock, " ", .NetworkInterfaceId // .GatewayId, " ", .State, "\n"'
  aws ec2 describe-route-tables --route-table-ids ${rt} | \
    jq -j '.RouteTables[].Routes[] | .DestinationCidrBlock, " ", .NetworkInterfaceId // .GatewayId, " ", .State, "\n"'
  aws ec2 describe-route-tables --route-table-ids ${rt} | \
    jq -j '.RouteTables[].Routes[] | .DestinationCidrBlock, " ", .NetworkInterfaceId // .GatewayId, " ", .State, "\n"'
done

Deploy DNS Cluster Add-on

And as our last step, we configure a DNS add-on which provides DNS based service discovery to all applications running inside our Kubernetes cluster.

The DNS Cluster Add-on

Create kube-dns.yaml file (working with Kubernetes v1.16):

# Copyright 2016 The Kubernetes Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# CHANGELOG:
# 08/11/2019
# Support for Kubernetes v1.16 added
# by @hajowieland https://wieland.tech | https://napo.io
#
apiVersion: v1
kind: Service
metadata:
  name: kube-dns
  namespace: kube-system
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/name: "KubeDNS"
spec:
  selector:
    k8s-app: kube-dns
  clusterIP: 10.32.0.10
  ports:
  - name: dns
    port: 53
    protocol: UDP
  - name: dns-tcp
    port: 53
    protocol: TCP
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-dns
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-dns
  namespace: kube-system
  labels:
    addonmanager.kubernetes.io/mode: EnsureExists
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kube-dns
  namespace: kube-system
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
spec:
  # replicas: not specified here:
  # 1. In order to make Addon Manager do not reconcile this replicas parameter.
  # 2. Default is 1.
  # 3. Will be tuned in real time if DNS horizontal auto-scaling is turned on.
  strategy:
    rollingUpdate:
      maxSurge: 10%
      maxUnavailable: 0
  selector:
    matchLabels:
      k8s-app: kube-dns
  template:
    metadata:
      labels:
        k8s-app: kube-dns
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      tolerations:
      - key: "CriticalAddonsOnly"
        operator: "Exists"
      volumes:
      - name: kube-dns-config
        configMap:
          name: kube-dns
          optional: true
      containers:
      - name: kubedns
        image: gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.7
        resources:
          # TODO: Set memory limits when we've profiled the container for large
          # clusters, then set request = limit to keep this container in
          # guaranteed class. Currently, this container falls into the
          # "burstable" category so the kubelet doesn't backoff from restarting it.
          limits:
            memory: 170Mi
          requests:
            cpu: 100m
            memory: 70Mi
        livenessProbe:
          httpGet:
            path: /healthcheck/kubedns
            port: 10054
            scheme: HTTP
          initialDelaySeconds: 60
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 5
        readinessProbe:
          httpGet:
            path: /readiness
            port: 8081
            scheme: HTTP
          # we poll on pod startup for the Kubernetes master service and
          # only setup the /readiness HTTP server once that's available.
          initialDelaySeconds: 3
          timeoutSeconds: 5
        args:
        - --domain=cluster.local.
        - --dns-port=10053
        - --config-dir=/kube-dns-config
        - --v=2
        env:
        - name: PROMETHEUS_PORT
          value: "10055"
        ports:
        - containerPort: 10053
          name: dns-local
          protocol: UDP
        - containerPort: 10053
          name: dns-tcp-local
          protocol: TCP
        - containerPort: 10055
          name: metrics
          protocol: TCP
        volumeMounts:
        - name: kube-dns-config
          mountPath: /kube-dns-config
      - name: dnsmasq
        image: gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.7
        livenessProbe:
          httpGet:
            path: /healthcheck/dnsmasq
            port: 10054
            scheme: HTTP
          initialDelaySeconds: 60
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 5
        args:
        - -v=2
        - -logtostderr
        - -configDir=/etc/k8s/dns/dnsmasq-nanny
        - -restartDnsmasq=true
        - --
        - -k
        - --cache-size=1000
        - --no-negcache
        - --log-facility=-
        - --server=/cluster.local/127.0.0.1#10053
        - --server=/in-addr.arpa/127.0.0.1#10053
        - --server=/ip6.arpa/127.0.0.1#10053
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        # see: https://github.com/kubernetes/kubernetes/issues/29055 for details
        resources:
          requests:
            cpu: 150m
            memory: 20Mi
        volumeMounts:
        - name: kube-dns-config
          mountPath: /etc/k8s/dns/dnsmasq-nanny
      - name: sidecar
        image: gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.7
        livenessProbe:
          httpGet:
            path: /metrics
            port: 10054
            scheme: HTTP
          initialDelaySeconds: 60
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 5
        args:
        - --v=2
        - --logtostderr
        - --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,SRV
        - --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,SRV
        ports:
        - containerPort: 10054
          name: metrics
          protocol: TCP
        resources:
          requests:
            memory: 20Mi
            cpu: 10m
      dnsPolicy: Default  # Don't use cluster DNS.
      serviceAccountName: kube-dns

Deploy kube-dns to the cluster:

kubectl create -f kube-dns.yaml

IT IS DONE! Great Work 👨‍💻

Now deploy some services on your shiny new the-hard-way created Kubernetes Cluster!

# Create deployment of nginx with 10 replicas
kubectl run nginx --image=nginx --replicas=10

Cleaning Up

If you are finished and want to destroy your whole infrastructure, just execute:

# Terraform
terraform destroy

# CDK
cdk destroy

The beauty of Infrastructure as Code 🥰

Further Steps / Ideas

For further training/learning you can do a lot of things with your handmade cluster!

Here just some ideas:

Deploy an Ingress service (like nginx-ingress / aws-alb-ingress)
Increase the master/worker node size in the IaC (CDK/Terraform), deploy the changes and join the new nodes to your cluster
Manually kill etcd/master/worker instances and learn how Kubernetes reacts
- what info do you get?
- where do you find important logs?
- what steps can you take to improve cluster healthiness?
- what happens when the AutoScalingGroup starts a new instance, e.g. a new K8s Worker node? (no certs, keys available for this new IP address, etc.)
Enhance UserdData to assign ENIs from a pre-defined internal IP address pool (adapt the LaunchConfigurations for etcd, master, worker)
Get to know why it makes sense to use tools like kubeadm

Final Words

If you encounter any problems or have some ideas on how to enhance the IaC code ➡️ please let me know!

I would be very happy to see some Pull Requests on GitHub for the Terraform and CDK Python code of this blog post: