Kubernetes The (real) Hard Way on AWS
Works with Kubernetes 1.16 !
For my preparation to the Cloud Native Computing Foundation - Certified Kubernetes Administrator exam (or CNCF CKA for short), it is important to get the Ins and Outs of creating Kubernetes clusters by hand. This includes generating all the certificates, systemd unit files, K8s configs and the installation of components.
Most of you may have already seen Kelsey Hightower’s fantastic “Kubernetes The Hard Way” tutorial on GitHub. There is even a LinuxAcademy course and forks of the tutorial for Bare Metal installations (GitHub / Medium). And of course there are already several guides for AWS (here and here).
So why write another one? Simple answer: there is not a single all-in-one guide for an AWS and Multi-Master non-stacked Kubernetes setup which I looked for. Besides, it was a good use-case to do some first baby steps with the AWS CDK for Python :)
If you haven’t already checked out the CDK, give it a try! => https://aws.amazon.com/cdk
So what is this all about?
Executive Summary: it creates the infrastructure and components for a multi-node non-stacked Kubernetes Cluster (v1.16!) on AWS - but doin’ it the real hard way!
Features
- CDK code available (Python3, generates CloudFormation)
- Terraform code available (>= 0.12 / HCL2)
- Multi Master HA Kubernetes control plane
- Non-Stacked setup (etcd servers are running on their own instances)
- 10x EC2 instances
- 1x Bastion Host (single-node ASG)
- 3x etcd nodes (ASG)
- 3x Kubernetes Master nodes (ASG)
- 3x Kubernetes Worker nodes (ASG)
- 3x Load Balancers
- Bastion Host LB: for safe access to Bastion Host instance
- K8s-master-public-LB: external kubectl access
- K8s-master-private-LB: fronts Kubernetes API Servers
- Route53 records for all EC2 instances’ internal IPv4 addresses (ease of use)
- External access via Bastion Host (SSH) & public Load Balancer (kubectl) only
- Access to BastionLB & MasterPublicLB from your workstation IP only (by default)
- Strict SecurityGroups
Infrastructure as Code
AWS CDK (Python, CloudFormation):
Create Infrastructure
⚡ The infrastructure created with Terraform and/or CDK may not be for production usage. But it is safe to use, it creates AutoScalingGroups (with LaunchConfigurations), a Bastion Host, Public/Private LoadBalancers and has tightened SecurityGroups assigned to all resources.
First we need to create our infrastructure and you can use either the AWS CDK or Terraform repository above. Both will create the same infrastructure with ten EC2 instances by default for a fully non-stacked Kubernetes setup. This means the etcd nodes are running on their own instances and not on top of the K8s Master nodes (= stacked-setup).
If you change the number of nodes, you have to adapt the below instructions accordingly.
In the IaC, set the Route53 Hosted Zone you want to use (Terraform: var.hosted_zone
/ CDK: zone_fqdn
). This will create an A record for the Bastion host like this:
- bastion.example.com
It also provisions the Bastion Host with a User Data script to install the cfssl binary (by CloudFlare) for easy creation of all the CSRs, certificates and keys.
SecurityGroups
Overview of SecurityGroups:
Component | Source | Protocol | Port | Description |
---|---|---|---|---|
Bastion | Bastion LB | TCP | 22 | SSH from Bastion LB |
Bastion LB | Workstation | TCP | 22 | SSH from Workstation |
etcd | Bastion | TCP | 22 | SSH from Bastion |
etcd | K8s-Master | TCP | 2379 | etcd-client |
etcd | K8s-Master | TCP | 2380 | etcd-server |
K8s-PublicLB | Workstation | TCP | 6443 | kubectl from Workstation |
K8s-PrivateLB | Masters | TCP | 6443 | kube-api from Masters |
K8s-PrivateLB | Workers | TCP | 6443 | kube-api from Workers |
K8s-Master | Bastion | TCP | 22 | SSH from Bastion |
K8s-Master | MasterPublicLB | TCP | 6443 | kubectl from Public LB |
K8s-Master | MasterPrivateB | TCP | 6443 | kube-api from Private LB |
K8s-Master | Bastion | TCP | 6443 | kubectl access from Bastion |
K8s-Master | K8s-Worker | any | any | Allow any from K8s-Worker |
K8s-Worker | Bastion | TCP | 22 | SSH from Bastion |
K8s-Worker | Bastion | TCP | 6443 | kubectl access from Bastion |
K8s-Worker | K8s-Master | any | any | Allow any from K8s-Master |
Load Balancers
And finally we will have these three Classic Elastic Load Balancers (CLB):
- K8s-Master Public ELB (for remote kube-apiserver / kubectl access from your workstation)
- K8s-Master Private ELB (fronts kube-apiservers)
- Bastion ELB (for secure SSH access into the Private Subnets)
Terraform
To apply the infrastructure with Terraform >=0.12, just clone my repository and create a terraform.tfvars file with your configuration:
git clone git@github.com:hajowieland/terraform-k8s-the-real-hard-way-aws.git
Requirements
Required Variables
You have to change the values of at least these two TF variables in your terraform.tfvars
file:
TF Variable | Description | Type | Default | Example |
---|---|---|---|---|
owner | Your Name | string | napo.io | Max Mustermann |
hosted_zone | Route53 Hosted Zone Name for DNS records | string | "" | test.example.com |
There are more variables to configure, but most of them have sane default values.
SSH KEY: If you do not specify a pre-exisintg AWS Key Pair with
var.aws_key_pair_name
, then TF creates a new one with your ~/.ssh/id_rsa key by defaultYou change this path by setting
var.ssh_public_key_path
Deploy the infrastructure with Terraform:
terraform init
terraform plan
terraform apply
CDK
To apply the infrastructure with the AWS CDK, just clone my repository and edit the cdk_python_k8s_right_way_aws_stack.py file accordingly:
git clone git@github.com:hajowieland/cdk-py-k8s-the-real-hard-way-aws.git
Requirements
- cdk installed via npm install -g cdk
- awscli with default profile configured
- Existing AWS Key Pair
- Python3
- In repository => VirtualEnv:
virtualenv .env -p python3
and thensource .env/bin/active
- Install pip3 requirements:
pip3 install -r requirements.txt
Required Variables
You have to change the values of at least these two variables in your cdk_python_k8s_right_way_aws_stack.py
file:
variable | Description | Type | Default | Example |
---|---|---|---|---|
ssh_key_pair | Existing AWS Key Pair | '' | id_rsa | MyKeyPairName |
tag_owner | Your Name | string | napo.io | Holy Kubernetus |
zone_fqdn | Route53 Hosted Zone Name | string | '' | test.example.com |
There are more variables to configure, most of them have sane default values.
Deploy the infrastructure with CDK:
cdk synth # outputs the rendered CloudFormation code
cdk deploy
BE AWARE:
If you change the default value for project in CDK (tag_project
) or Terraform (var.project
) then you have to adapt the filters in all following commands using aws-cli !
Defaults
By default all EC2 instances are created in us-east-1 AWS region.
Connect to Bastion Host
Now everything will take place on the Bastion host.
Connect via ssh, whether with the AWS Key Pair name you configured or, when using Terraform, if you have created a new AWS Key Pair (CDK/CloudFormation does not support creating Key Pairs):
It can take a few moments until all resources are ready
Get the internal IPs
On the Bastion Host, we first need to know the IPv4 addresses for all EC2 instances we created (the internal IPs - all nodes are running in private subnets).
In the UserData we already set some global environment variables to make your life easier, so you do not have to set them every time:
# Already set during infrastructure deployment
# Verify the values:
# echo $HOSTEDZONE_NAME && echo $AWS_DEFAULT_REGION
export HOSTEDZONE_NAME=napo.io # << from var.hosted_zone / zone_fqdn
export AWS_DEFAULT_REGION=$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/document|grep region|awk -F\" '{print $4}') # << AWS Region the EC2 instance is running for use with awscli
Before we continue with the next steps, get the Hosted Zone ID and export it as environment variable:
# Gets HostedZone ID from its name
export HOSTEDZONE_ID=$(aws route53 list-hosted-zones-by-name --dns-name $HOSTEDZONE_NAME --query 'HostedZones[].Id' --output text | cut -d/ -f3)
The next Shell commands help us identify the EC2 instances and create Route53 records for ease of use:
- During a for loop, we get all Instances by quering the AutoScalingGroups with a specific LaunchConfiguration prefix
- Tag the EC2 instance with incrementing number
- Get the private IP address of the EC2 instance
- Create a Route53 Recordset JSON file with heredoc (see example below)
- Create Route53 record with the private IP address and component name + incremented number
The temporary JSON files for the Route53 records look like this (just an example):
{
"Comment":"Create/Update etcd A record",
"Changes":[{
"Action":"UPSERT",
"ResourceRecordSet":{
"Name":"etcd$i.$ZONENAME",
"Type":"A",
"TTL":30,
"ResourceRecords":[{
"Value":"$IP"
}]
}
}]
}
⚠️ Be aware that you have do manually delete these Route53 records when you’re finished.
Now execute the following shell commands on the Bastion Host:
Etcd:
i=1
for INSTANCE in $(aws autoscaling describe-auto-scaling-instances --query 'AutoScalingInstances[?starts_with(LaunchConfigurationName, `etcd`)].[InstanceId]' --output text); do
aws ec2 create-tags --resources $INSTANCE --tags Key=Name,Value=etcd$i
IP=$(aws ec2 describe-instances --instance-id $INSTANCE --query 'Reservations[].Instances[].[PrivateIpAddress]' --output text)
cat << EOF > /tmp/record.json
{
"Comment":"Create/Update etcd A record",
"Changes":[{
"Action":"UPSERT",
"ResourceRecordSet":{
"Name":"etcd$i.internal.$HOSTEDZONE_NAME",
"Type":"A",
"TTL":30,
"ResourceRecords":[{
"Value":"$IP"
}]
}
}]
}
EOF
aws route53 change-resource-record-sets --hosted-zone-id $HOSTEDZONE_ID --change-batch file:///tmp/record.json
export ETCD${i}_INTERNAL=$IP
i=$((i+1))
done
Master:
i=1
for INSTANCE in $(aws autoscaling describe-auto-scaling-instances --query 'AutoScalingInstances[?starts_with(LaunchConfigurationName, `master`)].[InstanceId]' --output text); do
aws ec2 create-tags --resources $INSTANCE --tags Key=Name,Value=master$i
IP=$(aws ec2 describe-instances --instance-id $INSTANCE --query 'Reservations[].Instances[].[PrivateIpAddress]' --output text)
cat << EOF > /tmp/record.json
{
"Comment":"Create/Update master A record",
"Changes":[{
"Action":"UPSERT",
"ResourceRecordSet":{
"Name":"master$i.internal.$HOSTEDZONE_NAME",
"Type":"A",
"TTL":30,
"ResourceRecords":[{
"Value":"$IP"
}]
}
}]
}
EOF
aws route53 change-resource-record-sets --hosted-zone-id $HOSTEDZONE_ID --change-batch file:///tmp/record.json
export MASTER${i}_INTERNAL=$IP
i=$((i+1))
done
Worker:
i=1
for INSTANCE in $(aws autoscaling describe-auto-scaling-instances --query 'AutoScalingInstances[?starts_with(LaunchConfigurationName, `worker`)].[InstanceId]' --output text); do
aws ec2 create-tags --resources $INSTANCE --tags Key=Name,Value=worker$i
IP=$(aws ec2 describe-instances --instance-id $INSTANCE --query 'Reservations[].Instances[].[PrivateIpAddress]' --output text)
cat << EOF > /tmp/record.json
{
"Comment":"Create/Update worker A record",
"Changes":[{
"Action":"UPSERT",
"ResourceRecordSet":{
"Name":"worker$i.internal.$HOSTEDZONE_NAME",
"Type":"A",
"TTL":30,
"ResourceRecords":[{
"Value":"$IP"
}]
}
}]
}
EOF
aws route53 change-resource-record-sets --hosted-zone-id $HOSTEDZONE_ID --change-batch file:///tmp/record.json
export WORKER${i}_INTERNAL=$IP
i=$((i+1))
done
Setup SSH config
This step really makes your life easier and in the following steps we will use the names configured in the SSH client config file.
ℹ️ Replace the IdentityFile (your OpenSSH key used as AWS EC2 Key Pair) and HostName Domain accordingly - do not forget to add your SSH private key to the Bastion Host (e.g. place it at $HOME/.ssh/id_rsa
).
# On your workstation (macOS - and if your private key is id_rsa)
cat ~/.ssh/id_rsa | pbcopy
~/.ssh/id_rsa:
# On Bastion
vi ~/.ssh/id_rsa
chmod 400 ~/.ssh/id_rsa
On the Bastion Host, open the SSH config file and adapt to your setup (replace napo.io with your domain):
~/.ssh/config:
Host etcd1 etcd2 etcd3 master1 master2 master3 worker1 worker2 worker3
User ubuntu
HostName %h.internal.napo.io
IdentityFile ~/.ssh/id_rsa
chmod 600 ~/.ssh/config
Create Kubernetes Cluster the (real) hard way
With the preparation done, we now start doing Kubernetes The (real) Hard Way on AWS 🥳
Notes
For parallel execution on multiple instances at once, use tmux and this multiplexer script: https://gist.github.com/dmytro/3984680.
Both are already installed on the Bastion Host (the multiplexer script can be found in ec2-user $HOME
directory).
Later, use tmux
and execute tmux-multi.sh
script to run commands on multiple instances (all etcd/master/worker nodes) at once.
But first we are creating the certificates on the Bastion Host and transfer them to the instances.
Create certificates
Now lets start creating all the stuff needed for Kubernetes (CA, Signing Requests, Certs, Keys etc.)
Certificate Authority
For our Certificate Authority (Lifetime 17520h => 2 years) we create a CA config file and Certificate Signing Requests (CSRs) for the 4096-bit RSA key:
cat > ca-config.json <<EOF
{
"signing": {
"default": {
"expiry": "17520h"
},
"profiles": {
"kubernetes": {
"usages": ["signing", "key encipherment", "server auth", "client auth"],
"expiry": "17520h"
}
}
}
}
EOF
cat > ca-csr.json <<EOF
{
"CN": "Kubernetes",
"key": {
"algo": "rsa",
"size": 4096
},
"names": [
{
"C": "DE",
"L": "Munich",
"O": "Kubernetes",
"OU": "Kubernetes The Real Hard Way",
"ST": "$HOSTEDZONE_NAME"
}
]
}
EOF
Generate the Certificate Authority key from the previous CA config and CSR:
cfssl gencert -initca ca-csr.json | cfssljson -bare ca
Client and Server Certificates
Now create client and server certificates with their corresponding CSRs:
Admin Client Certificate
cat > admin-csr.json <<EOF
{
"CN": "admin",
"key": {
"algo": "rsa",
"size": 4096
},
"names": [
{
"C": "DE",
"L": "Munich",
"O": "system:masters",
"OU": "Kubernetes The Real Hard Way",
"ST": "$HOSTEDZONE_NAME"
}
]
}
EOF
Generate Admin client key:
cfssl gencert \
-ca=ca.pem \
-ca-key=ca-key.pem \
-config=ca-config.json \
-profile=kubernetes \
admin-csr.json | cfssljson -bare admin
Kubelet Client Certificates
Here we get all the Worker nodes, identify them via their LaunchConfiguration name and create the CSRs:
WORKERCOUNT=$(aws autoscaling describe-auto-scaling-instances --query 'AutoScalingInstances[?starts_with(LaunchConfigurationName, `worker`)].[InstanceId]' --output text | wc -l)
i=1
while [ "$i" -le "$WORKERCOUNT" ]; do
cat > worker${i}-csr.json <<EOF
{
"CN": "system:node:worker${i}.internal.${HOSTEDZONE_NAME}",
"key": {
"algo": "rsa",
"size": 4096
},
"names": [
{
"C": "DE",
"L": "Munich",
"O": "system:nodes",
"OU": "Kubernetes The Real Hard Way",
"ST": "${HOSTEDZONE_NAME}"
}
]
}
EOF
i=$(($i + 1))
done
Create the keys for all Worker nodes:
i=1
while [ "$i" -le "$WORKERCOUNT" ]; do
cfssl gencert \
-ca=ca.pem \
-ca-key=ca-key.pem \
-config=ca-config.json \
-hostname=worker${i}.internal.${HOSTEDZONE_NAME} \
-profile=kubernetes \
worker${i}-csr.json | cfssljson -bare worker${i}
i=$(($i + 1))
done
kube-controller-manager Certificate
Generate the kube-controller-manager client certificate and private key:
cat > kube-controller-manager-csr.json <<EOF
{
"CN": "system:kube-controller-manager",
"key": {
"algo": "rsa",
"size": 4096
},
"names": [
{
"C": "DE",
"L": "Munich",
"O": "system:kube-controller-manager",
"OU": "Kubernetes The Real Hard Way",
"ST": "${HOSTEDZONE_NAME}"
}
]
}
EOF
cfssl gencert \
-ca=ca.pem \
-ca-key=ca-key.pem \
-config=ca-config.json \
-profile=kubernetes \
kube-controller-manager-csr.json | cfssljson -bare kube-controller-manager
kube-proxy Client Certificate
Now create everything needed for the kube-proxy component.
First, again, the CSR:
cat > kube-proxy-csr.json <<EOF
{
"CN": "system:kube-proxy",
"key": {
"algo": "rsa",
"size": 4096
},
"names": [
{
"C": "DE",
"L": "Munich",
"O": "system:node-proxier",
"OU": "Kubernetes The Real Hard Way",
"ST": "$HOSTEDZONE_NAME"
}
]
}
EOF
… and then generate the key for kube-proxy:
cfssl gencert \
-ca=ca.pem \
-ca-key=ca-key.pem \
-config=ca-config.json \
-profile=kubernetes \
kube-proxy-csr.json | cfssljson -bare kube-proxy
kube-scheduler Client Certificate
Generate the kube-scheduler client certificate and private key:
cat > kube-scheduler-csr.json <<EOF
{
"CN": "system:kube-scheduler",
"key": {
"algo": "rsa",
"size": 4096
},
"names": [
{
"C": "DE",
"L": "Munich",
"O": "system:kube-scheduler",
"OU": "Kubernetes The Real Hard Way",
"ST": "$HOSTEDZONE_NAME"
}
]
}
EOF
cfssl gencert \
-ca=ca.pem \
-ca-key=ca-key.pem \
-config=ca-config.json \
-profile=kubernetes \
kube-scheduler-csr.json | cfssljson -bare kube-scheduler
kube-controller-manager ServiceAccount Token
To sign ServiceAccount tokens by the kube-controller-manager (see Documentation), create the certificate and private key:
cat > service-account-csr.json <<EOF
{
"CN": "service-accounts",
"key": {
"algo": "rsa",
"size": 4096
},
"names": [
{
"C": "DE",
"L": "Munich",
"O": "Kubernetes",
"OU": "Kubernetes The Real Hard Way",
"ST": "$HOSTEDZONE_NAME"
}
]
}
EOF
cfssl gencert \
-ca=ca.pem \
-ca-key=ca-key.pem \
-config=ca-config.json \
-profile=kubernetes \
service-account-csr.json | cfssljson -bare service-account
Kubernetes API Server Certificate
And finally the CSR for kube-apiserver:
cat > kubernetes-csr.json <<EOF
{
"CN": "kubernetes",
"key": {
"algo": "rsa",
"size": 4096
},
"names": [
{
"C": "DE",
"L": "Munich",
"O": "Kubernetes",
"OU": "Kubernetes The Real Hard Way",
"ST": "$HOSTEDZONE_NAME"
}
]
}
EOF
For generating the kube-apiserver keys, we need to define all the IPs which will access the Apiserver.
ℹ️ NOTE: 10.32.0.1 ==> kubernetes.default.svc.cluster.local.
https://github.com/kelseyhightower/kubernetes-the-hard-way/issues/105
First get the Kubernetes Master ELBs DNS names via their prefixes and assign them to envvars:
MASTER_ELB_PRIVATE=$(aws elb describe-load-balancers --query 'LoadBalancerDescriptions[? starts_with(DNSName, `internal-master`)]| [].DNSName' --output text)
MASTER_ELB_PUBLIC=$(aws elb describe-load-balancers --query 'LoadBalancerDescriptions[? starts_with(DNSName, `master`)]| [].DNSName' --output text)
Generate the API Server certificate key but adapt the number of etcd, worker and master nodes to your setup (here by default three nodes each).
We use the EnvVars we created earlier:
cfssl gencert \
-ca=ca.pem \
-ca-key=ca-key.pem \
-config=ca-config.json \
-hostname=10.32.0.1,${ETCD1_INTERNAL},\
${ETCD2_INTERNAL},${ETCD3_INTERNAL},\
${MASTER1_INTERNAL},${MASTER2_INTERNAL},\
${MASTER3_INTERNAL},${WORKER1_INTERNAL},\
${WORKER2_INTERNAL},${WORKER3_INTERNAL},\
etcd1.internal.${HOSTEDZONE_NAME},\
etcd2.internal.${HOSTEDZONE_NAME},\
etcd3.internal.${HOSTEDZONE_NAME},\
master1.internal.${HOSTEDZONE_NAME},\
master2.internal.${HOSTEDZONE_NAME},\
master3.internal.${HOSTEDZONE_NAME},\
worker1.internal.${HOSTEDZONE_NAME},\
worker2.internal.${HOSTEDZONE_NAME},\
worker3.internal.${HOSTEDZONE_NAME},\
${MASTER_ELB_PRIVATE},${MASTER_ELB_PUBLIC},\
127.0.0.1,kubernetes.default \
-profile=kubernetes \
kubernetes-csr.json | cfssljson -bare kubernetes
Distribute the Client and Server Certificates
Now with everything in place, we scp
all certificates to the instances.
ℹ️ The below commands only work if you have created ~/.ssh/config
entries as stated at the beginning!
If you have changed the default, adapt the number of etcd/master/worker nodes to match your setup
etcd
for etcd in etcd1 etcd2 etcd3; do
scp ca.pem ca-key.pem kubernetes-key.pem kubernetes.pem ${etcd}:~/
done
Masters/Controllers
for master in master1 master2 master3; do
scp ca.pem ca-key.pem kubernetes-key.pem \
kubernetes.pem service-account-key.pem service-account.pem \
${master}:~/
done
Workers
for worker in worker1 worker2 worker3; do
scp ca.pem ${worker}-key.pem ${worker}.pem ${worker}:~/
done
Generating Kubernetes Authentication Files for Authentication
Now in this step we generate the files needed for authentication in Kubernetes.
Client Authentication Configs
kubelet Kubernetes Configuration Files
Generate kubeconfig configuration files for kubelet of every worker:
for i in 1 2 3; do
instance="worker${i}"
instance_hostname="worker${i}.internal.$HOSTEDZONE_NAME"
kubectl config set-cluster kubernetes-the-real-hard-way \
--certificate-authority=ca.pem \
--embed-certs=true \
--server=https://${MASTER_ELB_PRIVATE}:6443 \
--kubeconfig=${instance}.kubeconfig
kubectl config set-credentials system:node:${instance_hostname} \
--client-certificate=${instance}.pem \
--client-key=${instance}-key.pem \
--embed-certs=true \
--kubeconfig=${instance}.kubeconfig
kubectl config set-context default \
--cluster=kubernetes-the-real-hard-way \
--user=system:node:${instance_hostname} \
--kubeconfig=${instance}.kubeconfig
kubectl config use-context default \
--kubeconfig=${instance}.kubeconfig
done
The kube-proxy Kubernetes Configuration File
Generate the kube-proxy kubeconfig:
kubectl config set-cluster kubernetes-the-real-hard-way \
--certificate-authority=ca.pem \
--embed-certs=true \
--server=https://${MASTER_ELB_PRIVATE}:6443 \
--kubeconfig=kube-proxy.kubeconfig
kubectl config set-credentials kube-proxy \
--client-certificate=kube-proxy.pem \
--client-key=kube-proxy-key.pem \
--embed-certs=true \
--kubeconfig=kube-proxy.kubeconfig
kubectl config set-context default \
--cluster=kubernetes-the-real-hard-way \
--user=kube-proxy \
--kubeconfig=kube-proxy.kubeconfig
kubectl config use-context default \
--kubeconfig=kube-proxy.kubeconfig
The kube-controller-manager Kubernetes Configuration File
Generate the kube-controller-manager kubeconfig:
kubectl config set-cluster kubernetes-the-real-hard-way \
--certificate-authority=ca.pem \
--embed-certs=true \
--server=https://127.0.0.1:6443 \
--kubeconfig=kube-controller-manager.kubeconfig
kubectl config set-credentials system:kube-controller-manager \
--client-certificate=kube-controller-manager.pem \
--client-key=kube-controller-manager-key.pem \
--embed-certs=true \
--kubeconfig=kube-controller-manager.kubeconfig
kubectl config set-context default \
--cluster=kubernetes-the-real-hard-way \
--user=system:kube-controller-manager \
--kubeconfig=kube-controller-manager.kubeconfig
kubectl config use-context default --kubeconfig=kube-controller-manager.kubeconfig
The kube-scheduler Kubernetes Configuration File
Generate the kubeconfig file for the kube-scheduler component:
kubectl config set-cluster kubernetes-the-real-hard-way \
--certificate-authority=ca.pem \
--embed-certs=true \
--server=https://127.0.0.1:6443 \
--kubeconfig=kube-scheduler.kubeconfig
kubectl config set-credentials system:kube-scheduler \
--client-certificate=kube-scheduler.pem \
--client-key=kube-scheduler-key.pem \
--embed-certs=true \
--kubeconfig=kube-scheduler.kubeconfig
kubectl config set-context default \
--cluster=kubernetes-the-real-hard-way \
--user=system:kube-scheduler \
--kubeconfig=kube-scheduler.kubeconfig
kubectl config use-context default --kubeconfig=kube-scheduler.kubeconfig
The admin Kubernetes Configuration File
And finally, the kubeconfig file for our admin user (that’s you 🙂):
kubectl config set-cluster kubernetes-the-real-hard-way \
--certificate-authority=ca.pem \
--embed-certs=true \
--server=https://127.0.0.1:6443 \
--kubeconfig=admin.kubeconfig
kubectl config set-credentials admin \
--client-certificate=admin.pem \
--client-key=admin-key.pem \
--embed-certs=true \
--kubeconfig=admin.kubeconfig
kubectl config set-context default \
--cluster=kubernetes-the-real-hard-way \
--user=admin \
--kubeconfig=admin.kubeconfig
kubectl config use-context default --kubeconfig=admin.kubeconfig
Distribute the Kubernetes Configuration Files
Now transfer the kubelet & kube-proxy kubeconfig files to the worker nodes:
for worker in worker1 worker2 worker3; do
scp ${worker}.kubeconfig kube-proxy.kubeconfig ${worker}:~/
done
And then the admin, kube-controller-manager & kube-scheduler kubeconfig files to the master nodes:
for master in master1 master2 master3; do
scp admin.kubeconfig \
kube-controller-manager.kubeconfig \
kube-scheduler.kubeconfig \
${master}:~/
done
Generating the Data Encryption Config and Key
For encryption we first create a secure encryption key and then the EncryptionConfiguration.
The Encryption Key
ENCRYPTION_KEY=$(head -c 32 /dev/urandom | base64)
The Encryption Config File
cat > encryption-config.yaml <<EOF
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: ${ENCRYPTION_KEY}
- identity: {}
EOF
Transfer the encryption config file to the master nodes:
for master in master1 master2 master3; do
scp encryption-config.yaml ${master}:~/
done
Bootstrapping the etcd Cluster
Now it is time to bootstrap our etcd cluster, which is our highly available key-value store for the Kubernetes API.
Think of it as the Kube API’s persistent storage for saving the state of all resources.
Now it is time to use the power of tmux and the multiplexer script:
- Start
tmux
- Execute
$HOME/tmux-multi.sh
- Enter
etcd1 etcd2 etcd3
(or more, according to your setup and how you configured your SSH config at the beginning)
Now we can execute the following commands in parallel on each etcd node.
First we get the etcdhost internal IPv4 address and set the hostname:
export ETCDHOST=$(aws ec2 describe-tags --filters "Name=resource-id,Values=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)" "Name=key,Values=Name" --output=text | cut -f 5)
sudo hostnamectl set-hostname --static $ETCDHOST.internal.$HOSTEDZONE_NAME
echo "$INTERNAL_IP $ETCDHOST.internal.$HOSTEDZONE_NAME" | sudo tee -a /etc/hosts
Install etcd and move the files:
wget -q --show-progress --https-only --timestamping \
"https://github.com/etcd-io/etcd/releases/download/v3.4.3/etcd-v3.4.3-linux-amd64.tar.gz"
{
tar -xvf etcd-v3.4.3-linux-amd64.tar.gz
sudo mv etcd-v3.4.3-linux-amd64/etcd* /usr/local/bin/
}
{
sudo mkdir -p /etc/etcd /var/lib/etcd
sudo cp ca.pem kubernetes-key.pem kubernetes.pem /etc/etcd/
}
Get etcd nodes IPv4 addresses end export them as envvars
Again: by default you have three etcds - adapt to your setup if necessary.
for i in 1 2 3; do export ETCD${i}_INTERNAL=$(dig +short etcd${i}.internal.${HOSTEDZONE_NAME}); done
Generate the etcd systemd unit file:
cat > etcd.service <<EOF
[Unit]
Description=etcd
Documentation=https://github.com/coreos
[Service]
ExecStart=/usr/local/bin/etcd \\
--name ${ETCDHOST}.internal.${HOSTEDZONE_NAME} \\
--cert-file=/etc/etcd/kubernetes.pem \\
--key-file=/etc/etcd/kubernetes-key.pem \\
--peer-cert-file=/etc/etcd/kubernetes.pem \\
--peer-key-file=/etc/etcd/kubernetes-key.pem \\
--trusted-ca-file=/etc/etcd/ca.pem \\
--peer-trusted-ca-file=/etc/etcd/ca.pem \\
--peer-client-cert-auth \\
--client-cert-auth \\
--initial-advertise-peer-urls https://${INTERNAL_IP}:2380 \\
--listen-peer-urls https://${INTERNAL_IP}:2380 \\
--listen-client-urls https://${INTERNAL_IP}:2379,http://127.0.0.1:2379 \\
--advertise-client-urls https://${INTERNAL_IP}:2379 \\
--initial-cluster-token etcd-cluster-0 \\
--initial-cluster etcd1.internal.${HOSTEDZONE_NAME}=https://${ETCD1_INTERNAL}:2380,etcd2.internal.${HOSTEDZONE_NAME}=https://${ETCD2_INTERNAL}:2380,etcd3.internal.${HOSTEDZONE_NAME}=https://${ETCD3_INTERNAL}:2380 \\
--initial-cluster-state new \\
--data-dir=/var/lib/etcd
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
Move the files to the right place, reload systemd and enable + start the etcd service:
sudo mv etcd.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable etcd
sudo systemctl start etcd
Check if etcds works
Check for any errors in systemd:
systemctl status etcd
List etcd members:
ETCDCTL_API=3 etcdctl member list
The output should look like this
OUTPUT:
2d2d6426a2ba46f2, started, etcd3.internal.napo.io, https://10.23.1.109:2380, https://10.23.1.109:2379, false
7e1b60cbd871ed2f, started, etcd1.internal.napo.io, https://10.23.3.168:2380, https://10.23.3.168:2379, false
a879f686f293ea99, started, etcd2.internal.napo.io, https://10.23.2.33:2380, https://10.23.2.33:2379, false
Debug
⚠️ If you somehow messed up your etcd, start the key-value store from scratch like this (Reference: https://github.com/etcd-io/etcd/issues/10101 )
ETCDCTL_API=3 etcdctl del "" --from-key=true
sudo systemctl stop etcd
sudo rm -rf /var/lib/etcd/default.etcd
sudo systemctl start etcd
Bootstrapping the Kubernetes Control Plane
Now that we have our working etcd cluster, it is time to bootstrap our Kubernetes Master Nodes.
exit
the tmux-multiplexer on etcds so that you’re back on the Bastion Host.Now again execute
$HOME/tmux-multi.sh
and type in the master nodes:
SSH to master1 master2 master3
via tmux multiplexer and execute in parallel on each master node.
First we get the masterhost internal IPv4 address and set the hostname:
export MASTERHOST=$(aws ec2 describe-tags --filters "Name=resource-id,Values=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)" "Name=key,Values=Name" --output=text | cut -f 5)
sudo hostnamectl set-hostname --static $MASTERHOST.internal.$HOSTEDZONE_NAME
echo "$INTERNAL_IP $MASTERHOST.internal.$HOSTEDZONE_NAME" | sudo tee -a /etc/hosts
Get the latest stable Kubernetes version (currently 1.16.3 as of this writing):
KUBERNETES_STABLE=$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)
echo $KUBERNETES_STABLE
Generate the Kubernetes config directory, download kube components and move them to /usr/local/bin
:
sudo mkdir -p /etc/kubernetes/config
wget -q --show-progress --https-only --timestamping \
"https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_STABLE}/bin/linux/amd64/kube-apiserver" \
"https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_STABLE}/bin/linux/amd64/kube-controller-manager" \
"https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_STABLE}/bin/linux/amd64/kube-scheduler" \
"https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_STABLE}/bin/linux/amd64/kubectl"
chmod +x kube-apiserver kube-controller-manager kube-scheduler kubectl
sudo mv kube-apiserver kube-controller-manager kube-scheduler kubectl /usr/local/bin/
Create the directory for certificates, keys and encryption config and move them there:
sudo mkdir -p /var/lib/kubernetes/
sudo mv ca.pem ca-key.pem kubernetes-key.pem kubernetes.pem encryption-config.yaml /var/lib/kubernetes/
Get the etcd nodes IPv4 addresses for the systemd unit file generation:
for i in 1 2 3; do export ETCD${i}_INTERNAL=$(dig +short etcd${i}.internal.${HOSTEDZONE_NAME}); done
Create the kube-apiserver systemd file.
Here all the fun takes place: options and parameters for kube-apiserver.
You can find the current documentation of all options here: https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/
cat > kube-apiserver.service <<EOF
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/kubernetes/kubernetes
[Service]
ExecStart=/usr/local/bin/kube-apiserver \\
--enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,PersistentVolumeClaimResize,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota \\
--advertise-address=${INTERNAL_IP} \\
--allow-privileged=true \\
--apiserver-count=3 \\
--audit-log-maxage=30 \\
--audit-log-maxbackup=3 \\
--audit-log-maxsize=100 \\
--audit-log-path=/var/log/audit.log \\
--authorization-mode=Node,RBAC \\
--bind-address=0.0.0.0 \\
--client-ca-file=/var/lib/kubernetes/ca.pem \\
--etcd-cafile=/var/lib/kubernetes/ca.pem \\
--etcd-certfile=/var/lib/kubernetes/kubernetes.pem \\
--etcd-keyfile=/var/lib/kubernetes/kubernetes-key.pem \\
--etcd-servers=https://${ETCD1_INTERNAL}:2379,https://${ETCD2_INTERNAL}:2379,https://${ETCD3_INTERNAL}:2379 \\
--event-ttl=1h \\
--encryption-provider-config=/var/lib/kubernetes/encryption-config.yaml \\
--insecure-bind-address=127.0.0.1 \\
--kubelet-certificate-authority=/var/lib/kubernetes/ca.pem \\
--kubelet-client-certificate=/var/lib/kubernetes/kubernetes.pem \\
--kubelet-client-key=/var/lib/kubernetes/kubernetes-key.pem \\
--kubelet-https=true \\
--runtime-config=api/all \\
--service-account-key-file=/var/lib/kubernetes/ca-key.pem \\
--service-cluster-ip-range=10.32.0.0/24 \\
--service-node-port-range=30000-32767 \\
--tls-cert-file=/var/lib/kubernetes/kubernetes.pem \\
--tls-private-key-file=/var/lib/kubernetes/kubernetes-key.pem \\
--v=5
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
Move kube-controller-manager kubeconfig to Kubernetes directory:
sudo mv kube-controller-manager.kubeconfig /var/lib/kubernetes/
Create kube-controller-manager systemd unit file:
cat > kube-controller-manager.service <<EOF
[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/kubernetes/kubernetes
[Service]
ExecStart=/usr/local/bin/kube-controller-manager \\
--address=0.0.0.0 \\
--cluster-cidr=10.200.0.0/16 \\
--cluster-name=kubernetes \\
--cluster-signing-cert-file=/var/lib/kubernetes/ca.pem \\
--cluster-signing-key-file=/var/lib/kubernetes/ca-key.pem \\
--leader-elect=true \\
--master=http://127.0.0.1:8080 \\
--root-ca-file=/var/lib/kubernetes/ca.pem \\
--service-account-private-key-file=/var/lib/kubernetes/ca-key.pem \\
--service-cluster-ip-range=10.32.0.0/24 \\
--v=2
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
Move kube-scheduler kubeconfig to Kubernetes directory:
sudo mv kube-scheduler.kubeconfig /var/lib/kubernetes/
Create the KubeSchedulerConfiguration config:
cat <<EOF | sudo tee /etc/kubernetes/config/kube-scheduler.yaml
apiVersion: componentconfig/v1alpha1
kind: KubeSchedulerConfiguration
clientConnection:
kubeconfig: "/var/lib/kubernetes/kube-scheduler.kubeconfig"
leaderElection:
leaderElect: true
EOF
Create the kube-scheduler systemd unit file:
cat > kube-scheduler.service <<EOF
[Unit]
Description=Kubernetes Scheduler
Documentation=https://github.com/kubernetes/kubernetes
[Service]
ExecStart=/usr/local/bin/kube-scheduler \\
--leader-elect=true \\
--master=http://127.0.0.1:8080 \\
--v=2
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
Move the files to the right place, reload systemd and enable + start the kube-* services:
sudo mv kube-apiserver.service kube-scheduler.service kube-controller-manager.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable kube-apiserver kube-controller-manager kube-scheduler
sudo systemctl start kube-apiserver kube-controller-manager kube-scheduler
Verify that everything works
Sadly, kubectl get componenstatus
(short: kubectl get cs
) is somehow deprecated and does not work correctly with Kubernetes 1.16 - the tables get mixed up: https://github.com/kubernetes/kubernetes/issues/83024
But we can check with increased verbosity if everything is healthy:
# Gives lots of output
# Use the curl commands below for health checking
kubectl get cs -v=8
Additionally we check for errors via systemd:
systemctl status kube-apiserver
systemctl status kube-controller-manager
systemctl status kube-scheduler
curl the healthz health check endpoint (you should get a HTTP 200
back):
curl --cacert /var/lib/kubernetes/ca.pem \
--key /var/lib/kubernetes/kubernetes-key.pem \
--cert /var/lib/kubernetes/kubernetes.pem \
-i https://127.0.0.1:6443/healthz
If you’re curious, you can check the version info, too:
curl --cacert /var/lib/kubernetes/ca.pem \
--key /var/lib/kubernetes/kubernetes-key.pem \
--cert /var/lib/kubernetes/kubernetes.pem \
-i https://127.0.0.1:6443/version
If everything looks good we can now move on to RBAC.
RBAC for Kubelet Authorization
The Role-based access control (RBAC) is the authz concept for Kubernetes. We need to create the ClusterRole
and its ClusterRoleBinding
for kubelet on the worker nodes.
Exit the tmux multiplexer and SSH to the first Master instance (master1):
ssh master1
Create and apply a ClusterRole for kubelet (Worker) to kube-apiserver (Master) authorization:
cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:kube-apiserver-to-kubelet
rules:
- apiGroups:
- ""
resources:
- nodes/proxy
- nodes/stats
- nodes/log
- nodes/spec
- nodes/metrics
verbs:
- "*"
EOF
Create and apply the corresponding ClusterRoleBinding for the above ClusterRole:
cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: system:kube-apiserver
namespace: ""
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:kube-apiserver-to-kubelet
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: User
name: kubernetes
EOF
Bootstrapping the Kubernetes Worker Nodes
Now we create the Worker Nodes who run the Pods in our cluster. They do all the heavy work and run most of the user software which we deploy on Kubernetes.
Provisioning Kubernetes Worker Nodes
SSH to worker1 worker2 worker3
via tmux multiplexer and execute in parallel on each worker node.
First we get the workerhost internal IPv4 address and set the hostname:
export WORKERHOST=$(aws ec2 describe-tags --filters "Name=resource-id,Values=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)" "Name=key,Values=Name" --output=text | cut -f 5)
sudo hostnamectl set-hostname --static $WORKERHOST.internal.$HOSTEDZONE_NAME
echo "$INTERNAL_IP $WORKERHOST.internal.$HOSTEDZONE_NAME" | sudo tee -a /etc/hosts
Get current Kubernetes stable version:
KUBERNETES_STABLE=$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)
Install the OS dependencies on Ubuntu via apt-get:
sudo apt-get update
sudo apt-get -y install socat conntrack ipset
Download & Install Worker Binaries
Download the CNI Plugins and worker binaries (kubelet, kube-proxy, kubectl):
wget -q --show-progress --https-only --timestamping \
https://github.com/containernetworking/plugins/releases/download/v0.8.2/cni-plugins-linux-amd64-v0.8.2.tgz \
https://github.com/containerd/containerd/releases/download/v1.3.0/containerd-1.3.0.linux-amd64.tar.gz \
https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_STABLE}/bin/linux/amd64/kubectl \
https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_STABLE}/bin/linux/amd64/kube-proxy \
https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_STABLE}/bin/linux/amd64/kubelet
Create the installation directories:
sudo mkdir -p \
/etc/cni/net.d \
/opt/cni/bin \
/var/lib/kubelet \
/var/lib/kube-proxy \
/var/lib/kubernetes \
/var/run/kubernetes
and finally extract and move the CNI plugins and binaries there:
sudo tar -xvf cni-plugins-linux-amd64-v0.8.2.tgz -C /opt/cni/bin/
sudo tar -xvf containerd-1.3.0.linux-amd64.tar.gz -C /
chmod +x kubectl kube-proxy kubelet
sudo mv kubectl kube-proxy kubelet /usr/local/bin/
Configure CNI Networking
Now we configure the CIDR ranges for the Pod network. This configures the network which the Pods on every Worker node use to communicate with eath other between nodes.
We configure the Kubernetes CNI here with bridge and loopback interfaces and add the routes in the AWS Route Tables later. We could of course use another overlay network CNI like flannel or Calico but for our “the hard way” setup it may be more to learn from by creating it ourself.
But please, play around with other CNIs later, get to know the Pros and Cons and when it makes sense to use one over the other (for example because of NetworkPolicies)
ℹ️ A little shady trick 🤯 I do in the IaC of every worker node’s UserData:
It generates a random number between 10-250 and exports the CIDR as environment variable
POD_CIDR
. This envvar is used now in the next command for creating the bridge configDefault Value of
POD_CIDR
:10.200.$RANDOM_NUMBER.0/24
)
echo $POD_CIDR
Create the bridge
network configuration file:
cat <<EOF | sudo tee /etc/cni/net.d/10-bridge.conf
{
"cniVersion": "0.3.1",
"name": "bridge",
"type": "bridge",
"bridge": "cnio0",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"ranges": [
[{"subnet": "${POD_CIDR}"}]
],
"routes": [{"dst": "0.0.0.0/0"}]
}
}
EOF
Create the loopback
network configuration file:
cat <<EOF | sudo tee /etc/cni/net.d/99-loopback.conf
{
"cniVersion": "0.3.1",
"type": "loopback"
}
EOF
Configure containerd
Install runc, a CLI tool for spawning and running containers according to the OCI runtime specification.
sudo apt-get install runc -y
Create the containerd
configuration TOML file:
sudo mkdir -p /etc/containerd/
cat << EOF | sudo tee /etc/containerd/config.toml
[plugins]
[plugins.cri.containerd]
snapshotter = "overlayfs"
[plugins.cri.containerd.default_runtime]
runtime_type = "io.containerd.runtime.v1.linux"
runtime_engine = "/usr/sbin/runc"
runtime_root = ""
[plugins.cri.containerd.untrusted_workload_runtime]
runtime_type = "io.containerd.runtime.v1.linux"
runtime_engine = "/usr/sbin/runsc"
runtime_root = "/run/containerd/runsc"
EOF
ℹ️ INFO: Untrusted workloads will be run using the gVisor (runsc) container runtime sandbox.
Create the containerd.service
systemd unit file:
cat <<EOF | sudo tee /etc/systemd/system/containerd.service
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target
[Service]
ExecStartPre=/sbin/modprobe overlay
ExecStart=/bin/containerd
Restart=always
RestartSec=5
Delegate=yes
KillMode=process
OOMScoreAdjust=-999
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
[Install]
WantedBy=multi-user.target
EOF
Configure the Kubelet
Now to the configuration of kubelet. Move the certs and keys to the right directory.
sudo mv $(hostname -s)-key.pem /var/lib/kubelet/
sudo mv $(hostname -s).pem /var/lib/kubelet/
sudo mv $(hostname -s).kubeconfig /var/lib/kubelet/kubeconfig
sudo mv ca.pem /var/lib/kubernetes/
Create a simple kubelet configuration file (KubeletConfiguration):
cat <<EOF | sudo tee /var/lib/kubelet/kubelet-config.yaml
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
authentication:
anonymous:
enabled: false
webhook:
enabled: true
x509:
clientCAFile: "/var/lib/kubernetes/ca.pem"
authorization:
mode: Webhook
clusterDomain: "cluster.local"
clusterDNS:
- "10.32.0.10"
podCIDR: "${POD_CIDR}"
runtimeRequestTimeout: "15m"
tlsCertFile: "/var/lib/kubelet/$(hostname -s).pem"
tlsPrivateKeyFile: "/var/lib/kubelet/$(hostname -s)-key.pem"
EOF
Create the kubelet.service
systemd unit file:
cat <<EOF | sudo tee /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/kubernetes/kubernetes
After=containerd.service
Requires=containerd.service
[Service]
ExecStart=/usr/local/bin/kubelet \\
--config=/var/lib/kubelet/kubelet-config.yaml \\
--container-runtime=remote \\
--container-runtime-endpoint=unix:///var/run/containerd/containerd.sock \\
--image-pull-progress-deadline=2m \\
--kubeconfig=/var/lib/kubelet/kubeconfig \\
--network-plugin=cni \\
--register-node=true \\
--v=2
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
Configure the Kubernetes Proxy
And finally the kube-proxy configuration.
Move the kube-proxy kubeconfig to the right directory:
sudo mv kube-proxy.kubeconfig /var/lib/kube-proxy/kubeconfig
Create the kube-proxy-config.yaml
configuration file. Here we define the overall Cluster CIDR network range (10.200.0.0/16):
cat <<EOF | sudo tee /var/lib/kube-proxy/kube-proxy-config.yaml
kind: KubeProxyConfiguration
apiVersion: kubeproxy.config.k8s.io/v1alpha1
clientConnection:
kubeconfig: "/var/lib/kube-proxy/kubeconfig"
mode: "iptables"
clusterCIDR: "10.200.0.0/16"
EOF
Create the kube-proxy.service
systemd unit file:
cat <<EOF | sudo tee /etc/systemd/system/kube-proxy.service
[Unit]
Description=Kubernetes Kube Proxy
Documentation=https://github.com/kubernetes/kubernetes
[Service]
ExecStart=/usr/local/bin/kube-proxy \\
--config=/var/lib/kube-proxy/kube-proxy-config.yaml
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
Start the Worker Services
Reload systemd and enable + start the containerd, kubelet and kube-proxy services:
sudo systemctl daemon-reload
sudo systemctl enable containerd kubelet kube-proxy
sudo systemctl start containerd kubelet kube-proxy
Check via systemd that there are no errros:
sudo systemctl status containerd
sudo systemctl status kubelet
sudo systemctl status kube-proxy
Verification
Exit the multiplexer and transfer the admin kubeconfig file to the first master node (master1):
Copy admin.kubeconfig
to one master server:
scp admin.kubeconfig master1:~/
Connect to first master server via SSH and get the worker nodes via kubectl:
ssh master1 "kubectl get nodes --kubeconfig admin.kubeconfig"
OUTPUT:
NAME STATUS ROLES AGE VERSION
worker1.internal.napo.io Ready <none> 90s v1.16.3
worker2.internal.napo.io Ready <none> 90s v1.16.3
worker3.internal.napo.io Ready <none> 90s v1.16.3
Configuring kubectl for Remote Access
We want to access the Kubernetes Cluster with the kubectl commandline utility from our Bastion Host as well as from our local Workstation.
👍 The Bastion Host already has kubectl
installed.
=> On your workstation, you can see here how to install kubectl for all Operating Systems.
The Admin Kubernetes Configuration File
We need to configure a Kubernetes API server endpoint where we want to connect to. For High Availability we created an internal Load Balancer who fronts the Kubernetes Master Servers (kube-apiservers). The other public Load Balancer’s DNS name is our external endpoint for remote access.
Internally, e.g. on the Bastion Host, we use our internal Load Balancer but for external access we use the public-facing one. This may sound unnecessary, but this way we can tighten the SecurityGroups even more.
Bastion Host / Internal access
Generate the kubeconfig file suitable for authenticating as admin user on the Bastion Host:
MASTER_ELB_PRIVATE=$(aws elb describe-load-balancers --query 'LoadBalancerDescriptions[? starts_with(DNSName, `internal-master`)]| [].DNSName' --output text)
kubectl config set-cluster kubernetes-the-real-hard-way \
--certificate-authority=ca.pem \
--embed-certs=true \
--server=https://${MASTER_ELB_PRIVATE}:6443
kubectl config set-credentials admin \
--client-certificate=admin.pem \
--client-key=admin-key.pem
kubectl config set-context kubernetes-the-real-hard-way \
--cluster=kubernetes-the-real-hard-way \
--user=admin
kubectl config use-context kubernetes-the-real-hard-way
Verify everything works from Bastion Host:
kubectl get nodes
Workstation / Remote access
Copy the the admin client cert and key together with the CA cert from the Bastion Host to your local workstation:
~/ca.pem
~/admin.pem
~/admin-key.pem
Generate the kubeconfig file suitable for authenticating as admin user on your workstation.
- you may have to set
--region us-east-1
to the region where our infrastructure is running - you may have to edit the paths to the certs and key if they aren’t in the current directory
MASTER_ELB_PUBLIC=$(aws elb describe-load-balancers --query 'LoadBalancerDescriptions[? starts_with(DNSName, `master`)]| [].DNSName' --region us-east-1 --output text)
kubectl config set-cluster kubernetes-the-real-hard-way \
--certificate-authority=ca.pem \
--embed-certs=true \
--server=https://${MASTER_ELB_PUBLIC}:6443
kubectl config set-credentials admin \
--client-certificate=admin.pem \
--client-key=admin-key.pem
kubectl config set-context kubernetes-the-real-hard-way \
--cluster=kubernetes-the-real-hard-way \
--user=admin
kubectl config use-context kubernetes-the-real-hard-way
Verify everything works from your Workstation:
kubectl get nodes
Hooray congratulations 🤗
Now we have safe remote access to our Kubernetes Cluster. But to really use it, we have to configure the Pod Network routes in the next step.
BE AWARE: If you workstation IP changes, you have to update the MasterPublicLB SecurityGroup to access Kubernetes!
Provisioning Pod Network Routes
Pods scheduled to a node receive an IP address from the node’s Pod CIDR range (POD_CIDR
envvar). At this point pods can not communicate with other pods running on different nodes due to missing network routes.
Now its time to create the routes in each Worker Node’s AWS Route Table. This establishes a network route from the Node’s POD_CIDR
to the Node’s internal IPv4 address.
ℹ️ This way we to not have to install any additional CNI.
Like mentioned before, we could use Flannel or some other way for archieving Kubernetes networking.
Routes
Connect back to the Bastion Host and create the network routes for each worker instance via aws-cli.
First get all private Route Tables and save them into the Bash Array ROUTE_TABLES
:
ROUTE_TABLES=($(aws ec2 describe-route-tables --filters "Name=tag:Attribute,Values=private" --query 'RouteTables[].Associations[].[RouteTableId]' --region us-east-1 --output text))
Then the next command connects to the Worker nodes via SSH and gets the value of the POD_CIDR
envvar, saves it into the Bash Array WORKER_POD_CIDRS
:
WORKER_POD_CIDRS=()
for i in 1 2 3; do
WORKER_POD_CIDRS+=($(ssh worker$i 'echo $POD_CIDR'))
done
Now create the Routes for every worker node’s POD_CIDR to the node’s ENI (Elastic Network Interface):
for rt in ${ROUTE_TABLES[@]}; do
i=1
for cidr in ${WORKER_POD_CIDRS[@]}; do
ENI_ID=$(aws ec2 describe-instances --filters "Name=tag:Name,Values=worker${i}" --query 'Reservations[].Instances[].NetworkInterfaces[].[NetworkInterfaceId]' --output text)
echo "${rt}: ${cidr} => ${ENI_ID}"
aws ec2 create-route \
--route-table-id ${rt} \
--destination-cidr-block ${cidr} \
--network-interface-id ${ENI_ID}
i=$((i+1))
done
done
OUTPUT:
You should see Return:true
for => (Number Workers) X (Private Route Tables) = 9 (by default)
rtb-093ea7f2ab5e6c2d6: 10.200.188.0/24 => eni-0f9e482a3d6ac5797
{
"Return": true
}
rtb-093ea7f2ab5e6c2d6: 10.200.166.0/24 => eni-0487ae6ec86bbef5c
{
"Return": true
}
rtb-093ea7f2ab5e6c2d6: 10.200.152.0/24 => eni-009f0deb164d3fafa
{
"Return": true
}
rtb-00b8aae6926b2e250: 10.200.188.0/24 => eni-0f9e482a3d6ac5797
{
"Return": true
}
rtb-00b8aae6926b2e250: 10.200.166.0/24 => eni-0487ae6ec86bbef5c
{
"Return": true
}
rtb-00b8aae6926b2e250: 10.200.152.0/24 => eni-009f0deb164d3fafa
{
"Return": true
}
rtb-03288ee836e727375: 10.200.188.0/24 => eni-0f9e482a3d6ac5797
{
"Return": true
}
rtb-03288ee836e727375: 10.200.166.0/24 => eni-0487ae6ec86bbef5c
{
"Return": true
}
rtb-03288ee836e727375: 10.200.152.0/24 => eni-009f0deb164d3fafa
{
"Return": true
}
Verify the the routes:
for rt in ${ROUTE_TABLES[@]}; do
aws ec2 describe-route-tables --route-table-ids ${rt} | \
jq -j '.RouteTables[].Routes[] | .DestinationCidrBlock, " ", .NetworkInterfaceId // .GatewayId, " ", .State, "\n"'
aws ec2 describe-route-tables --route-table-ids ${rt} | \
jq -j '.RouteTables[].Routes[] | .DestinationCidrBlock, " ", .NetworkInterfaceId // .GatewayId, " ", .State, "\n"'
aws ec2 describe-route-tables --route-table-ids ${rt} | \
jq -j '.RouteTables[].Routes[] | .DestinationCidrBlock, " ", .NetworkInterfaceId // .GatewayId, " ", .State, "\n"'
done
Deploy DNS Cluster Add-on
And as our last step, we configure a DNS add-on which provides DNS based service discovery to all applications running inside our Kubernetes cluster.
The DNS Cluster Add-on
Create kube-dns.yaml
file (working with Kubernetes v1.16):
# Copyright 2016 The Kubernetes Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# CHANGELOG:
# 08/11/2019
# Support for Kubernetes v1.16 added
# by @hajowieland https://wieland.tech | https://napo.io
#
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: "KubeDNS"
spec:
selector:
k8s-app: kube-dns
clusterIP: 10.32.0.10
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-dns
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-dns
namespace: kube-system
labels:
addonmanager.kubernetes.io/mode: EnsureExists
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
spec:
# replicas: not specified here:
# 1. In order to make Addon Manager do not reconcile this replicas parameter.
# 2. Default is 1.
# 3. Will be tuned in real time if DNS horizontal auto-scaling is turned on.
strategy:
rollingUpdate:
maxSurge: 10%
maxUnavailable: 0
selector:
matchLabels:
k8s-app: kube-dns
template:
metadata:
labels:
k8s-app: kube-dns
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
volumes:
- name: kube-dns-config
configMap:
name: kube-dns
optional: true
containers:
- name: kubedns
image: gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.7
resources:
# TODO: Set memory limits when we've profiled the container for large
# clusters, then set request = limit to keep this container in
# guaranteed class. Currently, this container falls into the
# "burstable" category so the kubelet doesn't backoff from restarting it.
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
livenessProbe:
httpGet:
path: /healthcheck/kubedns
port: 10054
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /readiness
port: 8081
scheme: HTTP
# we poll on pod startup for the Kubernetes master service and
# only setup the /readiness HTTP server once that's available.
initialDelaySeconds: 3
timeoutSeconds: 5
args:
- --domain=cluster.local.
- --dns-port=10053
- --config-dir=/kube-dns-config
- --v=2
env:
- name: PROMETHEUS_PORT
value: "10055"
ports:
- containerPort: 10053
name: dns-local
protocol: UDP
- containerPort: 10053
name: dns-tcp-local
protocol: TCP
- containerPort: 10055
name: metrics
protocol: TCP
volumeMounts:
- name: kube-dns-config
mountPath: /kube-dns-config
- name: dnsmasq
image: gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.7
livenessProbe:
httpGet:
path: /healthcheck/dnsmasq
port: 10054
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
args:
- -v=2
- -logtostderr
- -configDir=/etc/k8s/dns/dnsmasq-nanny
- -restartDnsmasq=true
- --
- -k
- --cache-size=1000
- --no-negcache
- --log-facility=-
- --server=/cluster.local/127.0.0.1#10053
- --server=/in-addr.arpa/127.0.0.1#10053
- --server=/ip6.arpa/127.0.0.1#10053
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
# see: https://github.com/kubernetes/kubernetes/issues/29055 for details
resources:
requests:
cpu: 150m
memory: 20Mi
volumeMounts:
- name: kube-dns-config
mountPath: /etc/k8s/dns/dnsmasq-nanny
- name: sidecar
image: gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.7
livenessProbe:
httpGet:
path: /metrics
port: 10054
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
args:
- --v=2
- --logtostderr
- --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,SRV
- --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,SRV
ports:
- containerPort: 10054
name: metrics
protocol: TCP
resources:
requests:
memory: 20Mi
cpu: 10m
dnsPolicy: Default # Don't use cluster DNS.
serviceAccountName: kube-dns
Deploy kube-dns
to the cluster:
kubectl create -f kube-dns.yaml
IT IS DONE! Great Work 👨💻
Now deploy some services on your shiny new the-hard-way created Kubernetes Cluster!
# Create deployment of nginx with 10 replicas
kubectl run nginx --image=nginx --replicas=10
Cleaning Up
If you are finished and want to destroy your whole infrastructure, just execute:
# Terraform
terraform destroy
# CDK
cdk destroy
The beauty of Infrastructure as Code 🥰
Further Steps / Ideas
For further training/learning you can do a lot of things with your handmade cluster!
Here just some ideas:
- Deploy an Ingress service (like nginx-ingress / aws-alb-ingress)
- Increase the master/worker node size in the IaC (CDK/Terraform), deploy the changes and join the new nodes to your cluster
- Manually kill etcd/master/worker instances and learn how Kubernetes reacts
- what info do you get?
- where do you find important logs?
- what steps can you take to improve cluster healthiness?
- what happens when the AutoScalingGroup starts a new instance, e.g. a new K8s Worker node? (no certs, keys available for this new IP address, etc.)
- Enhance UserdData to assign ENIs from a pre-defined internal IP address pool (adapt the LaunchConfigurations for etcd, master, worker)
- Get to know why it makes sense to use tools like kubeadm
Final Words
If you encounter any problems or have some ideas on how to enhance the IaC code ➡️ please let me know!
I would be very happy to see some Pull Requests on GitHub for the Terraform and CDK Python code of this blog post: