🧠 Second Brain
Search
Kubernetes
It’s a platform that allows you to run and orchestrate container workloads. Kubernetes has become the de-facto standard for your cloud-native apps to (auto-) Scale-out and deploys your open-source zoo fast, cloud-provider-independent. No lock-in here. You could use OpenShift or OKD. With the latest version, they added the OperatorHub which you can install as of today 182 items with just a few clicks. Also, check out Managed Data Stacks which were created to mitigate exactly that.
Some more reasons for Kubernetes are the move from infrastructure as code towards infrastructure as data, specifically as YAML. All the resources in Kubernetes that include Pods, Configurations, Deployments, Volumes, etc., can simply be expressed in a YAML file. Developers quickly write applications that run across multiple operating environments. Costs can be reduced by scaling down (even to zero with, e.g. [Knative][63]) and also by using plain python or other programming languages instead of paying for a service on Azure, AWS, or Google Cloud. Its management makes it easy through its modularity and abstraction, also with the use of Containers (Docker or [Rocket][65]), you can monitor all your applications in one place.
To get hands-on with Kubernetes you can install Docker Desktop with Kubernetes included. All of my examples are built on top of it and run on any cloud as well as locally. For a more sophisticated set-up in terms of Apache Spark, I suggest reading the blog post from Data Mechanics about Setting up, Managing & Monitoring Spark on Kubernetes. If you are more of a video guy, An introduction to Apache Spark on Kubernetes contains the same content but adds still even on top of it.
As said above, if setting up Kubernetes is too hard, there are Managed Data Stacks, where you can choose existing open-source tools to pick from.
Security: Separation of Concerns as with different namespaces.
# Kubernetes Orchestration
Continuously working towards a desired state.
- Everything is represented as a “Kubernetes Resources”
- A
Pod
is the smallest “schedulable” resource (~= container) - A Manifest (YAML) defines the desired state of a resource
- Kubernetes drives “reality” to the desired state
- The current state is updated based on “reality”
# Kubernetes Architecture
- etcd: defines and documents:
- current known state
- desired state
graph LR subgraph node kubelet["kubelet & kube-proxy"] containerd container end subgraph control_plane subgraph etcd kubernetes_resource end controllers kube-api scheduler[Default Scheduler] end subgraph yaml_file resource_configurations end resource_configurations --> kubectl kubectl --> kube-api controllers -->|adapts| kube-api scheduler -->|adapts| kube-api kube-api -->|informs| scheduler kube-api -->|informs| controllers kube-api -->|manages| kubernetes_resource["kubernetes resource:
- current known state
- desired state"] kube-api -->|informs| kubelet kubelet -->|updates state| kube-api kubelet -->|manages| containerd containerd --> container
Kubernetes Architecture image ^31c463
# Workload Resources
graph TD subgraph Workload Resources deployment-->replicaset-->pod statefulset-->pod daemonset-->pod cronjob-->job-->pod pod[Pod]-.->container container[Container] style container stroke-dasharray: 5 5 end
- Pods - smallest schedulable unit ~= container
-
Deployment - declarative updates for Pods
- ReplicaSet - ensures a specified number of Pods
- StatefulSet - manages stateful applications
- DaemonSet - ensures a Pod on each node
-
CronJob - runs Jobs on a schedule
- Job - runs Pods to completion
# Deployment Patterns
# Containers deployments
When to use multiple container inside a deplyoment?
In Kubernetes, it’s common to run multiple containers within a single Pod when the containers are tightly coupled application components that need to operate together. It’s a anti-pattern to use multiple containers inside the same pod, except for below patterns such as Sidecar, Ambassaador, etc. Usually you would use a different pod deployment for a DB or a different important service.
- Shared Storage: Containers in the same Pod share the same storage volumes. This can be beneficial for situations where one container writes to a shared volume and another reads from it.
- Inter-process Communication: Since containers in the same Pod share the same network namespace, they can easily communicate with each other using
localhost
and share the same Port space. - Sidecar Pattern: A common use case is the sidecar pattern, where the main application might need an auxiliary helper that pushes logs or data elsewhere. For example, one container might serve a web application while a sidecar container pushes logs or data to an external source.
- Adapter Pattern: You can use a second container to modify or adapt the data output of the main container in some way. For example, transforming output formats or adapting legacy systems to more modern requirements.
- Ambassador Pattern: A container can proxy or shuttle network connections for the main container. This can be used for sharding or partitioning in distributed systems.
Init-Container is another container, but these are specified in a sepreate part of the deployment.
Here an example:
|
|
# Services (Network)
Kubernetes provides several types of Services to expose your application inside or outside of a cluster. Let’s break them down:
- ClusterIP: This is the default service type.
- Scope: Internal to the cluster.
- Purpose: Provides a single IP address and port pair which routes traffic to the underlying Pods.
- Use-case: When you want to expose your service only within the Kubernetes cluster, for example, a backend service that should not be exposed to external traffic.
- NodePort: Exposes the service on each Node’s IP at a static port.
- Scope: External, using
<NodeIP>:<NodePort>
combination. - Purpose: Allocates a port from a specified range (default: 30000-32767) on each node and forwards traffic on that port to the service.
- Use-case: Useful for development and debugging, but typically not used directly for production workloads exposed externally.
- Scope: External, using
- LoadBalancer: Provisions an external load balancer in a cloud provider’s infrastructure and directs external traffic to the Kubernetes service.
- Scope: External.
- Purpose: Integrates with cloud providers to automatically provision an external load balancer pointing to the NodePort and ClusterIP services.
- Use-case: When running Kubernetes in a cloud provider that supports automatic load balancer provisioning (like AWS, GCP, Azure), this is a straightforward way to expose services to external traffic.
- ExternalName: Maps a service to a DNS name, rather than an IP.
- Scope: External.
- Purpose: Returns a CNAME record pointing to the specified external name.
- Use-case: Useful when you want to point a service to an external service outside the cluster without proxying traffic through Kubernetes.
- Headless Service: Service without a ClusterIP.
- Scope: Internal to the cluster.
- Purpose: Allows direct pod-to-pod communication without a virtual IP in the middle.
- Use-case: Useful for stateful applications like databases where direct pod addressing is preferable.
Ingress: Ingress is not a service type, but a separate Kubernetes resource designed for HTTP and HTTPS routing to services.
- Scope: External.
- Purpose: Allows you to define HTTP and HTTPS routes, host-based routing, path-based routing, SSL/TLS termination, and other advanced routing features. Ingress requires an Ingress Controller (like nginx, traefik, or others) to function.
- Use-case: When you want to expose multiple services under the same IP address with path- or host-based routing, and especially when you need SSL/TLS termination.
Decision Points:
- If you need simple internal communication: Use ClusterIP.
- For quick external exposure, especially during development: Use NodePort.
- If you’re using a cloud provider that supports it and need simple external exposure: Use LoadBalancer.
- To map a service to a DNS name: Use ExternalName.
- For direct pod-to-pod communication: Use a Headless Service.
- To expose HTTP/HTTPS applications with routing, SSL, etc.: Use Ingress.
As Kubernetes continues to evolve, there might be additional service types or routing mechanisms in the future. Always refer to the official Kubernetes documentation for the most up-to-date information.
# Pod Types
# Evicted
Evicted pods in Kubernetes are pods that have been terminated and removed from nodes due to various reasons, such as:
- Node pressure: When a node is under resource pressure (e.g., low on memory or disk space), Kubernetes may evict pods to free up resources.
- Quality of Service (QoS): Lower priority pods might be evicted to make room for higher priority pods.
- Node maintenance: Pods may be evicted when a node is being drained for maintenance.
Evicted pods remain in the cluster’s API server but are not running on any node. They stay in the “Evicted” state until they are manually deleted or automatically cleaned up by the cluster (depending on your cluster’s configuration).
To delete all evicted pods in a specific namespace, you can use the following kubectl command:
|
|
# Kinds
# DaemonSets
The desiredNumberScheduled
in a DaemonSet is not typically set directly. Instead, it’s determined by the number of nodes in your cluster that match the DaemonSet’s node selection criteria. This is why you don’t see a direct option to set this number in the Helm chart.
Here’s how it works:
- By default, a DaemonSet will try to schedule a pod on every node in the cluster.
# Alternatives
References: YAML, DevOps engine – Kubernetes