The extra components & services you need to be as productive on-prem as in a major cloud provider’s environment.
What’s missing with on-prem k8s?
After setting up our on-prem Kubernetes cluster, we found that out of the box Kubernetes is not as productive as a major cloud provider’s environment e.g. AWS, Azure, or GCP. When we're working on a platform like AWS, we leverage block storage, load balancing, DNS, certificate management, relational database, and other services heavily to build an application. While we were able to run containers, we were still lacking critical services for rapidly building applications. For example, there is no out of the box object store like S3, limiting our ability to leverage state. In this post, we will walk through how we brought our on-prem Kubernetes cluster to cloud parity i.e. adding components and services to make on-prem as productive as cloud. We’ll detail what was missing from out of the box Kubernetes and how we filled these gaps. Specifically, we will address:
- Missing block storage (e.g. EBS)
- Missing object storage (e.g. S3)
- Missing load balancing (e.g. ELB)
- Missing node level metrics (e.g. CloudWatch)
- Missing certificate life cycle management (e.g. AWS Certificate Manager)
- Missing CI/CD tool chain (e.g. AWS Code Build, Commit, Deploy, & Pipeline)
Missing block & object storage
Out of the box, Kubernetes has no block or object storage services. For example, since we can’t spin up persistent volumes, we can’t start common services such as Prometheus. Furthermore, while we could get around this by setting up local disks, they wouldn’t be replicated across our cluster, meaning every lost node may be a permenant data loss event.
To solve this problem, we deployed Rook + Ceph. Ceph is a replicated disk backend and rook is a management layer for Ceph on Kubernetes. The two allow us to provision replicated persistent volumes and object storage buckets. The object storage buckets have an S3 compatible API, allowing for drop in replacement for existing code. The Rook + Ceph documentation is extremely strong i.e. the tutorial is very easy to follow. You will need to prepare your volumes in advance on each node, and then adjust the tutorial for this configuration i.e. volume / mount path.
Missing load balancing
Kubernetes also lacks a load balancing mechanism, both external and internal, out of the box (i.e. we don’t have an implementation for services type=LoadBalancer after setup). While we could deploy Nginx as an ingress point, this would still leave us with a single point of failure at the container level.
To address this, we used Metallb + Istio. Metallb runs a container called sounder on each of our instances that announces a route into our cluster via BGP. We deployed a router that supports equal cost multi-path routing (ECMP) (our hardware), so it will load balance TCP flows (L4) across the routes. Each node in the cluster runs an Istio Ingress Gateway (L7) that receives the TCP traffic and performs load balancing at the HTTP layer. Istio also terminates TLS for the cluster, via an integration with certificate manager, and performs http to https redirection. We also use Istio for internal load balancing via sidecars. Our largest issue is that Istio is challenging to configure; it takes substantial time to read the docs and understand all of its many internal components.
Missing node level metrics
After deployment, there is very little visibility into how your applications are performing. Kubernetes does capture logs out of the box, giving you insight into specific processes, but it has little information on hardware and cluster wide issues. This is a particular problem if you intend to run distributed systems, as no individual process can describe the entire system’s behavior.
Luckily, digital ocean has produced a good tutorial on setting up node exporter, Prometheus, and Grafana. It is best to first setup Rook + Ceph, as you’ll want persistent disks to support Prometheus’s storage of your metrics. The node exporter makes available node level hardware metrics and then Prometheus scrapes the exposed endpoints for storage. While Prometheus does have a web frontend, it does not work well for metrics discovery or creating permanent dashboards. With the tutorial, the Grafana setup is preconfigured with useful dashboards including cluster wide node performance metrics. Later on, you can add more exporters to capture more metrics.
Missing certificate life cycle
One of our favorite additions to AWS was the integration of certificate provisioning into Elastic Load Balancer (ELB). This feature allowed for one click provisioning of a TLS enabled endpoint, a formerly big headache as provisioning ssl certs often took hours of filling forms and then manual installation.
For Kubernetes, adding CertManager with Let's Encrypt integration is equally seamless. CertManager has easy to follow setup instructions. Afterwards, you can define Certificate resources that are provisioned from Let's Encrypt via ACME. , You can then configure Istio to use the certificates to secure your ingress gateway’s endpoints. (While Istio can provision self signed certificates, this is inadequate for public endpoints). We used DNS verification with Cloudflare integration. The only issue was that returned certificate did not have all secret fields populated (see this issue). There is a known work around. Overall, with Metallb, Istio, and CertManager, Kubernetes now has an ingress stack with similar capabilities to AWS, Azure, or GCP.
Missing CI/CD
After Kubernetes setup, we still don’t have a place to store code or container images and we don’t have a mechanism for automating our builds. This is one component that we probably do not want to run on-prem; if our cluster is lost, we don't want to lose our software as well. This is an serious problem as our cluster is located entirely in one place i.e. no geographic isolation between nodes.
For hosted code, containers, and CI/CD we chose to use Github repos, registry, and actions. While Github actions is a young service (it’s down often…), the full integration for the entire life cycle of code to container with automation is compelling. Bringing the code sharing model to actions was a particularly smart feature i.e. you can add actions by referencing github repositories, crowd sourcing the details of automation. Furthermore, Azure published a Github action for pushing manifests to a Kuberenetes cluster after build, allowing for simple full CI/CD. You will need to open up a Kuberentes endpoint to the world to use this, so make sure you are using encryption and access control. Two issues that we experience with Github actions is that builds can be slow and pulling containers to the on-prem cluster is expensive: you pay $0.50 per gigabyte! To solve these problems, we are considering setting up a locally hosted container registry mirror (still want the containers stored off site) and running the Github actions runner on-prem.
What’s next?
There are a few days of work needed after setting up Kubernetes in order to make it as productive as AWS, Azure, or GCP platforms. There is a long list of components and services needed: Ceph, Rook, MetalLB, Istio, Prometheus, Grafana, CertManager and Github. Afterwards, you still won't have high level services such as databases, functions as a service, or analytical tools. If you’re work is primarily integrating these together into an application, there are few more tasks to do. Infrastructure wise, we still don’t have a machine provisioning service similar to EC2. We tried Ubuntu Landscape, but found it was unstable and needed to remove it. Later, we will evaluate MAAS, also from Canonical. We also still haven’t hardened our control plane: we are not running replicated etcd or API services for Kubernetes. Overall, we have cloud parity with the basic platform, but, dependingon the nature of your work, there are a few more high and low level services remaining to setup.