**Elements of Kubernetes** Stijn Heymans published: 17 December 2020 # Introduction In this article, I lay out the elements of Kubernetes _from an application developer's perspective_. Familiarity with deploying applications in a production environment is expected[^familiarity]. Rather than explaining Kubernetes from the ground-up and building a pie in the sky that no-one will eat because it has strawberries and whipped cream and who likes that, I introduce Kubernetes as a solution for problems you are encountering in high-traffic production environments right now: - Your application needs to serve 1000s of requests per second so you need many instances of the application running - Clients of your application need to connect to one IP address and requests will be routed to the different application instances automatically - You need to be able to push changes to your application without downtime - You need to be able to rollback changes to your application - Your application needs to be _warmed up_: things need to happen before your application can start serving up requests. For example, the application instance needs to sync its data with the data of other instances - When an instance of your application dies (and is thus no longer able to serve requests), that instance needs to become invisible to clients of your application. Traffic should be guided to still-functioning instances of your application - To save costs during the night, when you need less capability for your main application, you want to reuse some of the available capability to run batch jobs - There will be times when there will not be enough machines to serve your traffic (during a promotion, or, say, the holiday period), so you want to increase, _scale up_, the amount of physical machines you have available. After the busy period, you want to get back to normal and scale down It will come as no surprise (if you _are_ surprised: Sinterklaas does indeed exist _and_ the world is a beautiful place where no-one drags themselves through wondering _where and when in the name it all went wrong_) that _Kubernetes_ is _a_ solution for those problems. I'll refer to _Kubernetes_ by its common abbreviation of _k8s_ -- pronounced _kates_, because that's what all the cool kids do and I'll do the same even though it's aeons since I was a kid. Even though Kubernetes is a large hammer, it does hammer the above problems squarely away. I disparagingly say _hammer_ as I say almost everything disparagingly but also because k8s is able to do more than what you'd need to solve the above stated problems. Blame that on its historic origin as used by a Cloud Provider in which many different applications need to run on the same cluster of machines in a desperate capitalistic attempt to optimize usage of infrastructure. Given the problems I laid out, I'm not _that_ interested in running different applications on the same infrastructure, although the particular requirement to save cost during the night and run some batch jobs on the same machine that your machine runs on during the day, is designed to hint at that use. The requirements as laid out above come from my personal experience, and my limited knowledge I have on k8s comes from [Kubernetes in Action](https://www.goodreads.com/book/show/34013922-kubernetes-in-action) by Marko Lukša. A book that, like all good books, gave me all the misplaced confidence to play an expert on the Internet. The book goes in depth and is useful for infrastructure administrators (which is a whole world of joy on itself that I will barely touch on here). In this article, I mostly stick to what I know, application development, and I'll go over every single one of the above problems/requirements and discuss how k8s helps solving them. # Instances of your Application !!! Your application needs to serve 1000s of requests per second so you need many instances of the application running ## A Containerized Application As an example application, I'll run a web server that serves up the sentence ``` Hi from HOST ``` where `HOST` will be the host the application is running on. We're in a brave new world, so this application is packed up so it can run as a Docker container. I posted the source of this application on [github](https://github.com/sheymans/elementsOf/tree/master/kubernetes/demo/hi_app) and I pushed the container image to the [Docker Hub](https://hub.docker.com/) with name [stijnh/hi_app](https://hub.docker.com/r/stijnh/hi). If at this point you feel slightly lost and it's related to this article, this may be a good time to take a break to read up on the [Elements of Docker Containers](http://www.stijnheymans.net/elements_of_docker_containers.html), or have a Snickers, what do I know. The container image is public so you can try out the app on your laptop right now with: ``` docker container run -d -p 8111:8111 -t stijnh/hi ``` Recall that `docker container run` indicates you're asking to run a container image; `-d` indicates you're going to do that in the background as a daemon (rather than interactive), `-p 8111:8111` means you're mapping your `localhost` port 8111 (on your laptop) to port `8111` on your docker container behind which your app is running, `-t stijnh/hi` indicates the image you're using to get the container from. Open your browser and navigate to `localhost:8111`. You will see ``` Hi from fbd698569ec5 ``` where `fbd698569ec5` will be different for you as this is the container's hostname. At this point you have _1 instance of your application running_. It does so as a container on your laptop. Your laptop is the _machine_ (or for people that like their engineering terms like their sunsets, with drama, your _bare metal machine_). ## Multiple Instances of your Application ### Pod You have 1 instance of our app running, but you'll have thousands of users that you need to say hi to, so you can't have only 1 instance -- you have dreams, megalomaniac dreams! More instances! At least 3! But first I need to define tad more precise what we need 3 of. If I need 3 instances of the application, I need 3 containers (each container runs the app). As this example is focused on a particular set of requirements that k8s solves, I've not mentioned other real-world requirements like logging. Typically, your application will be writing logs, and you'll need something to _rotate_ these logs away to more permanent storage (to [AWS's S3](https://aws.amazon.com/s3/) for example, where you could query it using [Athena](https://aws.amazon.com/athena/)). I want this process of log rotating in a separate container (other apps could use it as well so it's not specific to this app). I want these 2 containers always running together[^sidecar], so if I say I want 3 instances of my app, I actually want 3 instances of my app with the logging container. Hence, the first level of abstraction -- the base unit k8s deals with -- is _not_ a container, it's something called a _pod_. So let's do some more wrapping, and wrap that container into a _Pod_ manifest[^manifest] that describes what the pod should look like. If you recall `Dockerfile`s, then this is similar in philosophy: in a `Dockerfile` you describe your app so that `docker` knows how to build an image, in a _Pod_ manifest you describe what all goes in your _Pod_, most importantly what container images to use such k8s knows how to create a Pod out of several containers. Pods can be defined in [YAML](https://en.wikipedia.org/wiki/YAML) files, so I'll have a file `hi-pod.yaml`: ```yaml apiVersion: v1 kind: Pod metadata: name: hi-pod spec: containers: - image: stijnh/hi name: hi ports: - containerPort: 8111 ``` which is not doing much more than naming my pod `hi-pod` and indicating that it's running the container [`stijnh/hi`](https://hub.docker.com/r/stijnh/hi). Let's go over the pod definition, line by line: ``` apiVersion: v1 ``` indicates what API version of k8s this k8s _resource_ is defined for (_Pods_, together with a whole lot of other things, are named _resources_ in k8s), in this case `v1`. I have no memory, nor patience, for remembering API versions, but there are [explanations of what version to use](https://matthewpalmer.net/kubernetes-app-developer/articles/kubernetes-apiversion-definition-guide.html). ``` kind: Pod ``` indicates the type of resource you're defining, a `Pod`. ``` metadata: name: hi-pod ``` indicates metadata about the pod, in this case its name `hi-pod`. ``` spec: containers: - image: stijnh/hi name: hi ports: - containerPort: 8111 ``` indicates the pod's specification, in this case a list of 1 container, where the container is described by its image tag `stijnh/hi`, given a name `hi`, and indicates that the container exposes the port `8111`. A pattern worth remembering are the 4 sections that you will see with other types of resources as well: the `apiVersion`, the `kind`, the `metadata`, and the `spec`. To deploy this pod, I assume you're on your laptop and have something like [minikube](https://github.com/kubernetes/minikube) installed. As I'm tackling k8s from the perspective of an application developer, I do not tackle setting up more than a toy cluster using `minikube`. If you want to sing-along, go check out [minikube](https://github.com/kubernetes/minikube) and install it. While installing `minikube`, you might have read that minikube will give you a _single-node cluster_. You have only ever heard cluster in the context of "What a cluster...! Who ate the last Oreos??!!!" Your feeling for the English language tells you that clusters may be related to that exclamation of frustration, but not exactly the same. My working definition of a _cluster_ is that it's a set of nodes. What's a _node_? Again, my working definition is _that's a machine, an EC2 instance, a laptop, some kind of thing with a CPU and memory, an old Pentium tower in a dusty basement, ..._ A typical mapping in my head would be _if I have 40 [EC2](https://aws.amazon.com/ec2/) instances in the EU region to serve traffic_, that's a _cluster_ of _40_ nodes. With `minikube` running, try this: ```shell $ kubectl get nodes NAME STATUS ROLES AGE VERSION minikube Ready master 0d v1.18.3 ``` `kubectl` is the command-line tool for interacting with your cluster (asking it about nodes, pods, ...) The output of the command shows that 1 node, called `minikube`. All later examples involve that 1 node. In your real life, you'll work with clusters that have many more nodes. We have a cluster with 1 node. We have our `hi` app, tucked away in a container, and we defined a pod that is supposed to run that container. Let's deploy that pod using `kubectl`: ```shell $ kubectl create -f hi_pod.yaml pod/hi-pod created ``` Verify that your pod is created: ```shell $ kubectl get pods NAME READY STATUS RESTARTS AGE hi-pod 1/1 Running 0 54s ``` This shows that the pod is `Running` and has been doing so for `54s`. Note the levels of indirection: - when you ran the app `hi.js` on our laptop, you could navigate to `localhost:8111` and see the `Hi from...` - when you ran the app in a container, you needed to forward the localhost's port `8111` _into_ the container using `docker container run -d -p 8111:8111 -t stijnh/hi` - when you run the app in a container in a pod, you need to do what? You could forward the port via `kubectl`, but that is mostly for debugging. I'll answer this question in the next section where I'll talk about how to make your pod visible to other pods. For now, I'll just show how to check your logs on the pod with name `hi-pod`: ```shell $ kubectl logs hi-pod hi: listening on port 8111 of host hi-pod ``` That's indeed what the app logs when you start it. Note that the host is listed as `hi-pod` which corresponds to the name we gave the pod in `hi-pod.yaml`. See more details for your pod by doing a `describe`: ```shell $ kubectl describe pod hi-pod ``` Scroll down to the last section: ```shell Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled default-scheduler Successfully assigned default/hi-pod to minikube Normal Pulling 23h kubelet, minikube Pulling image "stijnh/hi" Normal Pulled 23h kubelet, minikube Successfully pulled image "stijnh/hi" Normal Created 23h kubelet, minikube Created container hi Normal Started 23h kubelet, minikube Started container hi ``` The container image `stijnh/hi` is pulled, the container is created, and finally, the container is started. This picture summarizes the current state -- the container image `stijnh/hi`, deployed on a pod `hi-pod` on 1 node (your laptop): ![Running 1 pod on your laptop](./diagrams/pod.svg) ### Deployment One instance of your application is up and running, in 1 container, on 1 pod. I indicated that _your application needs to serve 1000s of requests per second_ and, humor me, it turns out 1 instance of your application is not able to do that. For more good humor, assume that each application can serve 250 requests per second. If you do that math, as I'm sure you do without hesitation and with elegance, you see that we need 4 instances of the application. "OK, let me run 4 containers of my application," you say.
"Yes, but no," I say. "The unit of operation when your application is managed by k8s is a pod, so we want 4 pods, each in turn running 1 container."
"So I _was_ right," you say.
"Stop being pedantic and pay attention." There's a k8s resource type (the first one we met was a `Pod`) that allows to specify how many pods k8s to keep around: a [`Deployment`](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/). The word on the street is that you should specify more than 1 pod, to ward off loneliness. As with pods, you specify a deployment using a YAML manifests. The line that gets me my 4 pods is: ```yaml spec: replicas: 3 ``` Glad to see you're awake Grasshopper. Four of course: ```yaml spec: replicas: 4 ``` How will you describe _which_ pod you want 4 replicas of? Kubernetes uses _labels_ to determine which replicas you're talking about. On the _Pod_ side, you have to make sure your pods are labeled, and then you indicate in your `Deployment` what labels you're describing. An example of a label is `app=hi`. Another label could be `tier=dev`, or `color=blue`. The latter is obviously no good, `color=yellow` is be better. Labels are _key/value pairs_, and you can have multiple different labels on a pod (but you cannot have 2 labels with the same _key_: so `color=blue` and `color=yellow` does not work). On the `Deployment` side, you then indicate that you want 3 replicas of all pods matching _these labels_. For example, you indicate that you are describing pods with label `app=hi`: ```yaml spec: selector: matchLabels: app: hi ``` Easy enough.
"But but, we did not give our pod this label, so there's nothing to find."
Yes, The final piece of such a `Deployment` is to describe what kind of pods you want to create, a so-called _pod template_: ```yaml spec: template: metadata: labels: app: hi spec: containers: - image: stijnh/hi name: hi ports: - containerPort: 8111 ``` The latter looks like the pod definition earlier, except that it also specifies that the pod needs to have a label `app=hi` as metadata. In fact, now that we have a deployment you can forget all about that initial pod definition. This `Deployment` knows all it needs to know to create your 3 pods (_4! pay attention!_). Put it all together: ``` apiVersion: apps/v1 kind: Deployment metadata: name: hi-deployment spec: replicas: 4 selector: matchLabels: app: hi template: metadata: labels: app: hi spec: containers: - image: stijnh/hi name: hi ports: - containerPort: 8111 ``` This specification describes a `Deployment` with name `hi-deployment`. It specifies that at all times there need to be 4 replicas of a `Pod` matching `app=hi`. The `template` indicates how the `Deployment` will go about creating new pods: it will run the container `stijnh/hi` and it will label the pod with `app=hi`. The latter is important as this ensures that the just created Pod is managed by the `Deployment`. Imagine if we create a `Pod` with label `app=yo`. The `Deployment` would never conclude it reached 4 replicas of pods with label `app=hi` so pods would keep on getting created. Before you try this `Deployment`, delete all resources you have so far, and then check that you have 0 pods running: ```shell $ kubectl delete all --all $ kubectl get pods ``` Now, create a deployment using the usual `create` based on the above YAML (`hi-deployment.yaml`): ```shell $ kubectl create -f hi-deployment.yaml ``` And then watch the magic: ```shell $ kubectl get pods NAME READY STATUS RESTARTS AGE hi-deployment-5f7b895fd9-5hb5h 0/1 ContainerCreating 0 3s hi-deployment-5f7b895fd9-dt8kf 0/1 ContainerCreating 0 3s hi-deployment-5f7b895fd9-r58kc 0/1 ContainerCreating 0 3s hi-deployment-5f7b895fd9-wgwpp 0/1 ContainerCreating 0 3s ``` Note the `STATUS` `ContainerCreating`, and then a couple of seconds later: ```shell $ kubectl get pods NAME READY STATUS RESTARTS AGE hi-deployment-5f7b895fd9-5hb5h 1/1 Running 0 8s hi-deployment-5f7b895fd9-dt8kf 1/1 Running 0 8s hi-deployment-5f7b895fd9-r58kc 1/1 Running 0 8s hi-deployment-5f7b895fd9-wgwpp 1/1 Running 0 8s ``` Verify that the pods have the label `app=hi` so they're under management of the `Deployment`: ```shell $ kubectl get pods --show-labels NAME READY STATUS RESTARTS AGE LABELS hi-deployment-5f7b895fd9-5hb5h 1/1 Running 0 2m40s app=hi,pod-template-hash=5f7b895fd9 hi-deployment-5f7b895fd9-dt8kf 1/1 Running 0 2m40s app=hi,pod-template-hash=5f7b895fd9 hi-deployment-5f7b895fd9-r58kc 1/1 Running 0 2m40s app=hi,pod-template-hash=5f7b895fd9 hi-deployment-5f7b895fd9-wgwpp 1/1 Running 0 2m40s app=hi,pod-template-hash=5f7b895fd9 ``` Yep, there we have it, `app=hi`. I promised that the Deployment specifies that at all times there need to be 4 replicas. So what if you delete one? For example, delete the first pod: ```shell $ kubectl delete pod hi-deployment-5f7b895fd9-5hb5h ``` Then check your pods with label `app=hi` -- you can use the `-l` flag to specify to only see pods with that label: ```shell $ kubectl get pods -l app=hi NAME READY STATUS RESTARTS AGE hi-deployment-5f7b895fd9-dt8kf 1/1 Running 0 6m6s hi-deployment-5f7b895fd9-h7mz7 1/1 Running 0 82s hi-deployment-5f7b895fd9-r58kc 1/1 Running 0 6m6s hi-deployment-5f7b895fd9-wgwpp 1/1 Running 0 6m6s ``` Still 4 pods, shucks! But if you look at the `NAME`s you see that `hi-deployment-5f7b895fd9-5hb5h` is gone and replaced by the new pod `hi-deployment-5f7b895fd9-dt8kf`. The `Deployment` guaranteed that 4 replicas are running at all times. The Zombie Apocalypse is nigh. We checked the 4 pods using `kubectl get pods`. You can also get more info on the `Deployment` resources: ```shell $ kubectl get deployment NAME READY UP-TO-DATE AVAILABLE AGE hi-deployment 4/4 4 4 8m48s ``` Note that it indicates that `4` out `4` pods are ready. In this article, I'll stay at the abstraction of `Deployment` and `Pods` but if you do the following, you'll see that there's something called a [`ReplicaSet`](https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/) created as well. ```shell $ kubectl get replicaset NAME DESIRED CURRENT READY AGE hi-deployment-5f7b895fd9 4 4 4 10m ``` It's actually this `ReplicaSet` that is created by the `Deployment` that makes sure that the replicas are kept at `4` (see the `DESIRED` column). The `Deployment` does not create the pods directly. I'll not go into more detail around `ReplicaSet`s as you'll usually not deal with them directly. ![A Deployment](./diagrams/deployment.svg) In the previous, I kept referring to 4 instances of the application or 4 pods, not 4 nodes. The 4 pods running the containers are all running on the same node on your laptop. In reality, you'll have more than 1 node to run your application, and your application instances may have been scheduled on different nodes to accommodate for, for example, CPU requirements of your application. # Clients of your Application Recall the 2nd requirement I listed in the Introduction: !!! Clients of your application need to connect to one IP address and requests will be routed to the different application instances automatically ## (Internal) Clients on the Same Cluster First consider clients on the same cluster. Why is it tricky for clients to connect to your application? - Your application runs on several pods. Each pod has its own IP. For a client to connect to a pod, it needs to know the IP of the pod, but since pods are _ephemeral_ (they could be removed when a node fails etc), that IP may change, so even if your client has the IP of the pod, that IP may become invalid over time - There are multiple instances of your application running (4 in our case), so the client needs to know all 4 IPs and then select one to connect to How to avoid that a client needs to know that list of ever-changing IPs? Another k8s resource to the rescue : a _Service_. A _Service_ gives you 1 IP and load-balances requests to that IP by redirecting requests to the pods that are able to serve the request. As with _Deployments_ before, one question is how will the Service know what Pods it controls? Labels! Remember that all of the pods in our current deployment have label `app=hi`. We can define a _Service_ that provides the IP and load-balancing for exactly those pods. Create a file `hi_service.yaml`: ``` apiVersion: v1 kind: Service metadata: name: hi-service spec: ports: - port: 80 targetPort: 8111 selector: app: hi ``` I defined a service (`kind: Service`), a name for the service (`hi-service`), and I specified that port `80` of the service forwards to port `8111` of the container (if you scroll up, you can see that `8111` is indeed the port our `hi` app is listening on). Finally, by using a `selector`, I specified that this service is a service fronting pods with label `app=hi`. Create the service as usual: ```shell $ kubectl create -f hi_service.yaml ``` And verify it's there: ```shell $ kubectl get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE hi-service ClusterIP 10.105.43.171 80/TCP 7s ``` Note the `CLUSTER-IP`: the service received an IP local to your cluster. All requests to this IP (port 80, as specified) route to 1 of the pods with label `app=hi` on port `8111`. Note also that the `EXTERNAL-IP` is ``: this IP is only available to clients within the cluster. I'll show in the next section how to make the service available outside the cluster. ![A Service](./diagrams/service.svg) From that picture, note that the deployment serves a different function than the service. The deployment regulates the number of replicas of the pods, whereas the service's purpose is to provide 1 interfacing IP for clients that routes to the IPs of the pods. In fact, you can see the latter by doing: ```shell $ kubectl describe service hi-service Name: hi-service Namespace: default Labels: Annotations: Selector: app=hi Type: ClusterIP IP: 10.105.43.171 Port: 80/TCP TargetPort: 8111/TCP Endpoints: 172.18.0.6:8111,172.18.0.7:8111,172.18.0.8:8111 + 1 more... Session Affinity: None Events: ``` Note the `Endpoints`: those are the IPs of your pods (including the port `8111` on which your container runs). I've highlighted that this allows clients _from within the cluster_ to hit your service at the IP `10.105.43.171` so they are routed to the different pods. How do I test this? How do I get a client _from within the cluster_? One way to get _in the cluster_ is using `exec` which makes it possible to execute a command of your choice on a pod of your choice -- assuming the pod is _in the cluster_ you'd like to be _in_. What command do you want to execute to test your cluster IP? You probably want something like `curl 10.105.43.171`, so you do: ```shell $ kubectl exec hi-deployment-5f7b895fd9-dt8kf -- curl 10.105.43.171 ``` Note that you picked a pod `hi-deployment-5f7b895fd9-dt8kf` by looking at `kubectl get pods`. The `--` separates your `kubectl` command-line arguments from the command you're executing (which starts with `curl..`). If you execute this command, you'll get an error: ``` \"curl\": executable file not found in $PATH" ``` Stepping back you then notice that you based your container image on the small Linux image `alpine` and `curl` does not come installed with that Linux (check out the [Dockerfile](https://github.com/sheymans/elementsOf/blob/master/kubernetes/demo/hi_app/Dockerfile#L1) for the image [stijnh/hi_app](https://hub.docker.com/r/stijnh/hi)). What does come installed that you could use? [`wget`](https://en.wikipedia.org/wiki/Wget). ```shell $ kubectl exec hi-deployment-5f7b895fd9-dt8kf -- wget -q -O - 10.105.43.171 ``` Execute that a few times and see subsequent outputs: ``` Hi from hi-deployment-5f7b895fd9-r58kc Hi from hi-deployment-5f7b895fd9-wgwpp Hi from hi-deployment-5f7b895fd9-h7mz7 ``` You're hitting 1 IP `10.105.43.171` but you see that different instances of your application (on different pods) answer the call with those different hostnames. Magic! ## (External) Clients In the previous section, you maneuvered yourself on a pod in the cluster to access the app via the service's cluster-local IP. In the real world, you want _external_ clients, such as your user's browser, to have access to your application. There are several ways you can accomplish this. I'll focus on one: _creating an [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) resource_. If someone vomits up the word _[ingress](https://www.thefreedictionary.com/ingress)_, there better be a reason. There's not of course: someone started calling an _entrance_ an _ingress_ and should be punished for it. But I digress. To specify the ingress resource, create a YAML `hi-ingress.yaml`: ```yaml apiVersion: extensions/v1beta1 kind: Ingress metadata: name: hi-ingress spec: rules: - host: hi.stijnheymans.net http: paths: - path: / backend: serviceName: hi-service servicePort: 80 ``` As they do on Wall Street, line by line: ``` kind: Ingress ``` indicates that this describes an `Ingress` k8s resource. Give it a name: ``` name: hi-ingress ``` And now for the interesting part. I specify a rule that indicates that I want to route the URL `hi.stijnheymans.net/` to the service `hi-service` at port `80`. Recall from the above that I defined such a service `hi-ingress` (and recall that that service in itself routes requests to any of the pods that it serves). ``` rules: - host: hi.stijnheymans.net http: paths: - path: / backend: serviceName: hi-service servicePort: 80 ``` Put the `Ingress` in play: ```shell $ kubectl create -f hi-ingress.yaml ``` Check that it's there: ```shell $ kubectl get ingresses NAME CLASS HOSTS ADDRESS PORTS AGE hi-ingress hi.stijnheymans.net 192.168.64.2 80 10s ``` It's there and it's indicating an IP address that you can hit (on your laptop, so an _external_ IP -- not a cluster IP): `192.168.64.2`. As you've most likely not registered `hi.stijnheymans.net`, let's trick [DNS](https://en.wikipedia.org/wiki/Domain_Name_System) resolution to think `hi.stijnheymans.net` resolves to the IP `192.168.64.2`. On your laptop, edit `/etc/hosts` by adding the following line: ``` 192.168.64.2 hi.stijnheymans.net ``` Now let's see the magic at work: navigate to [http://hi.stijnheymans.net](http://stijnheymans.net). You should see `Hi from hi-deployment-5f7b895fd9-zfb4m` (or something similar). Refresh like the crazy Flower Child you are and feel the glorious greetings from different hosts be extended to you. This is the smell of success you hoped for when you opened this page on . In summary, this is how things flow: ![External Client](./diagrams/ingress.svg) I made a simplification to this diagram and left out the [Ingress controller](https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/): I don't want to know, I don't need to know, and `Ingress` as a name was worse enough, making it also a `Controller` is just that one step too far. # Push Changes Recall the 2 requirements around pushing changes to your app: !!! You need to be able to push changes to your application without downtime !!! You need to be able to rollback changes to your application This was the YAML describing our `Deployment`: ``` apiVersion: apps/v1 kind: Deployment metadata: name: hi-deployment spec: replicas: 4 selector: matchLabels: app: hi template: metadata: labels: app: hi spec: containers: - image: stijnh/hi name: hi ports: - containerPort: 8111 ``` You're eternally unsatisfied so you have a second version of your application ready: a new image `stijnh/hi:v2`, to replace the original `stijnh/hi`. How can you make sure that all application instances are running `stijnh/hi:v2` instead of `stijnh/hi`? The naive answer is the correct one: replace `stijnh/hi` in the `Deployment` by `stijnh/hi:v2`: ``` - image: stijnh/hi:v2 ``` That's it. The default strategy for roll-outs that k8s `Deployment`s use is the `RollingUpdate` strategy. With that strategy, k8s scales down the number of pods that run the old image from 4 to 0 and scales up the number of pods with the new image from 0 to 4. This guarantees that there's no downtime, but it would force you to make sure that `v2` is compatible with the first version of your app. Indeed, the service will start routing requests to new _and_ old pods while we are in the intermediate state. This may cause problems if you, for example, did database changes and `v2` of your app is writing data in a different format than before and `v1` and `v2` are both reading from that data but `v1` expects the old format so crashes on that newly written data. Some programmatic care is usually warranted to avoid situations like that, often leading to an intermediate `v2` with the end state in a `v3`. I leave this as an exercise to reader. Yeah. That's right. Move on. If you see no such programmatic way out, but you are willing to tolerate downtime, you can use the `Recreate` strategy. Your deployment is then as follows: ``` apiVersion: apps/v1 kind: Deployment metadata: name: hi-deployment spec: replicas: 4 selector: matchLabels: app: hi strategy: type: Recreate template: metadata: labels: app: hi spec: containers: - image: stijnh/hi:v2 name: hi ports: - containerPort: 8111 ``` This will cause k8s to take out all of your `stijnh/hi` pods and immediately scale up all of the new `stijnh/hi:v2` pods. This causes a small downtime when old the old pods are gone and none of the new ones are there yet, or may result in a situation where while `v2` is scaling up, you do not have enough pods to serve production traffic. To finish this section, some quick commands that allow you to check the deployment rollouts/rollbacks. ```shell $ kubectl rollout status deployment hi-deployment ``` If something went wrong you can execute a rollback immediately as follows: ```shell $ kubectl rollout undo deployment hi-deployment ``` You may be of the controlling kind, and you wonder how to control the rolling update: how much percentage of the old pods can you take out while bringing in new pods with the latest version? Go checkout out [proportional scaling](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#proportional-scaling). # Warm-up Of Your Application !!! Your application needs to be _warmed up_: certain things need to happen before your application can start serving up requests. For example, the application instance needs to sync its data with the data of other instances When a pod gets deployed on a cluster (as part of a `Deployment` for example), it might not immediately be ready for serving traffic. The app may need to pull configuration data from [S3](https://aws.amazon.com/s3/) into memory, or in a distributed setting, the app may need to reach consensus with other instances. These are situations where a service should not immediately route traffic to that new pod. Only when the new pod is _ready_ it gets included as one of the pods that a service routes traffic to. How to get an indicator of whether a pod is ready to serve traffic? Use a _readiness probe_. There are several ways to define readiness probes, but I'll focus on a `HTTP GET` readiness probe: doing a `GET` to the pod will indicate, based on the returned status of that `GET`, whether the pod is ready for serving traffic or not -- _you probe the pod for readiness_. You can specify the `readinessProbe` in the pod specification. Change the following pod: ``` apiVersion: v1 kind: Pod metadata: name: hi-pod spec: containers: - image: stijnh/hi name: hi ports: - containerPort: 8111 ``` to include the `readinessProbe`: ``` apiVersion: v1 kind: Pod metadata: name: hi-pod spec: containers: - image: stijnh/hi name: hi ports: - containerPort: 8111 readinessProbe: httpGet: path: /readiness port: 8888 initialDelaySeconds: 300 periodSeconds: 30 ``` Once the pod is deployed, k8s waits 300 seconds (`initialDelaySeconds`), and then every 30 seconds (`periodSeconds`) it does a `GET` on port 8888 of the container at path `/readiness`. If that `GET` returns a [`200`](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#2xx_success) your pod is ready and will be included for routing traffic to. This set-up requires that the app defines a path `/readiness` and that it only returns `200` if it is truly ready. For a web app, this naturally follows when the web app is up and running. If there's more complex start-up going on (like the reaching of consensus in a distributed setting), you should guarantee that the `/readiness` path only returns a `200` when the application has reached consensus. # Are you alive? !!! When an instance of your application dies (and is thus no longer able to serve requests), that instance needs to become invisible to clients of your application. Traffic should be guided to still-functioning instances of your application Well, that requirement sounds related to the readiness requirement, doesn't it? In this case, one or more instances of your application died (or started spewing errors and only Leonardo DiCaprio knows why but he won't tell). If that happens, we want to take the pod with the dead application out of rotation (the service should stop routing traffic to the pod). Similar to the above `readinessProbe`, there's a concept called `livenessProbe`. You define (for the HTTP case) a path on your app that indicates whether your app is alive on a `GET` on that path. If that `GET` indicates it's not alive (for example, no `200` was returned), the pod gets taken out of rotation and a new pod is restarted. Liveness probes are specified similarly to readiness probes. # Horizontal Autoscaling !!! To save costs during the night when you need less capability for your main application, you want to reuse some of the available capability to run batch jobs Note that this does not talk about getting rid of actual machines/nodes: the requirement indicates that there are periods that you want to downscale the number of pods of your main application to make room on the nodes for other jobs. You can schedule jobs during the night using k8s resources like [CronJobs](https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/) so that's not the interesting part of this requirement. The core of what I'm interested in is: !!! Can I minimize the amount of pods while still being able to serve traffic? Simplified, during the day I may need 4 replicas of my application, but during the night I may only need 1. Rather than having all 4 replicas run all the time and incur costs, can I scale down to the amount of replicas _I need_ during the night and scale back up during the day? This question leads first to another question: what does _I need_ mean? When should I scale up or down? Well, you can specify that based on different metrics. One such metric is _CPU usage_: if average CPU usage goes above a threshold, say 80% for a certain amount of time, spin up extra pods so that that average will go down again. Another commonly used metric is _transactions per second (TPS)_. Start with the snippet that indicates that each pod can handle about 200 transactions per second: ``` metric: name: transactions-per-second describedObject: apiVersion: networking.k8s.io/v1beta1 kind: Ingress name: hi-ingress target: type: AverageValue value: 200 ``` Analyzing this line-by-line: ``` metric: name: transactions-per-second ``` gives the metric the name `transactions-per-second` (the name does not do anything but being a name, a bit like `Brad Pitt`). ``` describedObject: apiVersion: networking.k8s.io/v1beta1 kind: Ingress name: hi-ingress ``` indicates the object I'm describing. I'm describing our `Ingress` `hi-ingress`. Recall that the Ingress is the resource that makes our service available externally. Indeed, rather than describing the traffic on the Pod directly, I refer to the externally facing part of our service. ``` target: type: AverageValue value: 200 ``` indicates that the `Ingress` should _on average_ be sending 200 requests to each of the pods. Now that I went over the metric specification, I can put it in context of the k8s resource that will have such a metric specification: a [`HorizontalPodAutoscaler`](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) (HPA). The HPA will indicate what should happen when we go over those `200` average requests to the pods. We should scale up the amount of pods according to a linked `Deployment`. Is there a limit to scaling up the pods? Yes, there is and the HPA indicates that as well. Let's write it all up in a YAML `hi-hpa.yaml`: ```yaml apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: hi-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: hi-deployment minReplicas: 1 maxReplicas: 5 metrics: - type: Object object: metric: name: transactions-per-second describedObject: apiVersion: networking.k8s.io/v1beta1 kind: Ingress name: hi-ingress target: type: AverageValue value: 200 ``` In addition to the metric `transactions-per-second` I defined earlier, you also see: ``` scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: hi-deployment ``` which refers to our `hi-deployment`. Recall that it is this deployment that tells k8s what pods look like (and by default makes sure there are 4 pods with the right labels). You also see ``` minReplicas: 1 maxReplicas: 5 ``` Whereas the default is `4` pods, the HPA may scale you down to 1 pod or scale you up to 5 pods until we hit the target of `200` transactions on average on the `Ingress`. Re-read that previous paragraph and cherish the road you've traveled since starting to read this article. At the minimum, I wore you down. Try it out as usual with: ```shell $ kubectl create -f hi-hpa.yaml ``` # Autoscale the cluster !!! There will be times when there will not be enough machines to serve your traffic (during a promotion, or, say, the holiday period), so you want to increase, _scale up_, the amount of physical machines you have available. After the busy period, you want to get back to normal and scale down The HPA I defined helps increase and decrease pods on your cluster, making up space for other applications/batch processes to run, and thus helps save costs. However, the HPA does nothing about the actual physical nodes in your cluster. You could have a cluster of 100 nodes with 1 pod running on one of them. Hardly efficient. Or vice versa, say your traffic is exploding (or _blowing up_, if you prefer your hyperboles not originating from Latin): trying to scale up from 1 to 100 pods on 1 available node in your cluster, will lead to nothing but disaster. Speaking in relative terms of disaster, so not comparing to real disasters such as not having croissants on a Spring Sunday morning. In both scenarios (1 pod on 100 nodes, 100 pods on 1 node), you want to decrease or increase the actual nodes in your cluster. We're now solidly in the realm of someone more knowledgeable than me (we've wandered longer in this realm than you'd like to know, but I'm keeping the spirits high). I'll throw in a final desperate attempt. Kubernetes has the concept of a [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler). When configured for your cluster, it'll make sure nodes are spun up or down given the requirements of the pods running on it. For example, if the HPA determines, I need 1000 more pods to be able to meet this `200` average transactions-per-second requirement, your cluster can spin up the right amount of nodes so you can meet that 1000 pods requirement. Configuration of the cluster depends on the cluster provider, see [here](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) for details. Note that [scaling up the cluster will take more time](https://www.cncf.io/blog/2019/10/29/kubernetes-autoscaling-101-cluster-autoscaler-horizontal-autoscaler-and-vertical-pod-autoscaler/) than scaling up the pods as scaling up the cluster will require k8s asking your cluster provider for actual physical instances/nodes. # Conclusion I gave an elemental overview of a subset of problems k8s is solving for you. As usual, I've not talked about a thousand and one other topics, such as [volumes](https://kubernetes.io/docs/concepts/storage/volumes/), [secrets](https://kubernetes.io/docs/concepts/configuration/secret/), [jobs](https://kubernetes.io/docs/concepts/workloads/controllers/job/), [cron jobs](https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/), [namespaces](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/),... As mentioned in the introduction, the book [Kubernetes in Action](https://www.goodreads.com/book/show/34013922-kubernetes-in-action) by Marko Lukša, or the internet itself, if it obliges, will answer all of your further questions. One of those questions I can answer for you right now: if you saw a dog today, yes, you had a good day. # Endnotes [^familiarity]: If you can imagine what things like _downtime_, _rollback_, _production_, _client_, and _IP address_ mean, you're OK. [^sidecar]: We refer to containers that run alongside your _main_ container as _sidecars_ for obvious non-inspired reasons.