Advanced

Advanced Level Overview

This level covers expert mastery and production optimization through 28 examples, achieving 75-95% coverage of production Kubernetes knowledge. Each example demonstrates enterprise patterns needed for secure, scalable, and maintainable cluster operations.

What you'll learn:

RBAC and Security (authentication, authorization, SecurityContext, PodSecurityPolicies)
Network Policies (pod-to-pod traffic control, namespace isolation)
Custom Resources and Operators (extending Kubernetes, custom controllers)
Helm Charts (package management, templating, releases)
GitOps and ArgoCD (declarative deployment automation)
Production Patterns (monitoring, logging, debugging, performance)

Prerequisites: Completion of Beginner (1-28) and Intermediate (29-57) examples or equivalent production Kubernetes experience

RBAC & Security (Examples 58-63)

Example 58: ServiceAccount and RBAC Basics

ServiceAccounts provide identities for Pods while RBAC (Role-Based Access Control) controls permissions. Roles define permissions, RoleBindings grant permissions to ServiceAccounts.

%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
%% RBAC authorization flow
graph TD
 A[Pod with ServiceAccount] --> B[API Request]
 B --> C{RBAC Check}
 C --> D[Role: pod-reader]
 D --> E{Has Permission?}
 E -->|Yes| F[Allow: list pods]
 E -->|No| G[Deny: 403 Forbidden]
 
 style A fill:#0173B2,stroke:#000000,stroke-width:2px,color:#fff
 style C fill:#DE8F05,stroke:#000000,stroke-width:2px,color:#fff
 style D fill:#CA9161,stroke:#000000,stroke-width:2px,color:#fff
 style F fill:#029E73,stroke:#000000,stroke-width:2px,color:#fff
 style G fill:#CC78BC,stroke:#000000,stroke-width:2px,color:#fff

# Step 1: Create ServiceAccount (Pod identity)
apiVersion: v1
kind: ServiceAccount # => Identity for Pods (not Users)
metadata:
 name: pod-reader-sa # => ServiceAccount name referenced by Pods
 namespace:
 default # => Namespace-scoped resource
 # => ServiceAccounts cannot be shared across namespaces
 # => Each namespace requires separate ServiceAccount
automountServiceAccountToken:
 true # => Auto-mount token in Pods (default)
 # => Token location: /var/run/secrets/kubernetes.io/serviceaccount/
 # => Contains: token, ca.crt, namespace files
 
---
# Step 2: Define Role (permissions within namespace)
apiVersion: rbac.authorization.k8s.io/v1
kind: Role # => Namespace-scoped permissions
metadata:
 name: pod-reader # => Role name referenced by RoleBinding
 namespace:
 default # => Permissions apply only within default namespace
 # => Cannot access Pods in other namespaces
rules: # => Array of permission rules (OR relationship)
 - apiGroups:
 [""] # => Core API group (v1 resources)
 # => Empty string = core API
 # => apps group = deployments, statefulsets
 # => batch group = jobs, cronjobs
 resources:
 ["pods"] # => Resource types (plural names)
 # => Subresources: pods/log, pods/exec, pods/portforward
 # => Each subresource requires explicit permission
 verbs:
 ["get", "list", "watch"] # => Allowed operations (CRUD subset)
 # => get: fetch single Pod by name
 # => list: retrieve all Pods in namespace
 # => watch: stream Pod changes (long-polling)
 # => MISSING: create, update, patch, delete
 # => Read-only access pattern
 
---
# Step 3: Bind Role to ServiceAccount
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding # => Grants Role permissions to subjects
metadata:
 name: read-pods-binding # => RoleBinding name (arbitrary)
 namespace:
 default # => Must match Role namespace
 # => RoleBinding only effective in this namespace
subjects: # => WHO gets permissions (users, groups, ServiceAccounts)
 - kind:
 ServiceAccount # => Subject type (ServiceAccount, User, Group)
 # => User: for kubectl/external access
 # => Group: for multiple users
 # => ServiceAccount: for Pods
 name:
 pod-reader-sa # => ServiceAccount name (must exist)
 # => This ServiceAccount gains Role permissions
 namespace:
 default # => ServiceAccount namespace
 # => Can bind to ServiceAccounts in other namespaces
roleRef: # => WHAT permissions to grant (immutable after creation)
 kind:
 Role # => References Role (namespace-scoped)
 # => Could reference ClusterRole for namespace-scoped use
 # => CANNOT change after RoleBinding creation
 name:
 pod-reader # => Role name (must exist in same namespace)
 # => Role defines actual permissions
 apiGroup:
 rbac.authorization.k8s.io # => RBAC API group
 # => Required for roleRef validation
 
---
# Step 4: Use ServiceAccount in Pod
apiVersion: v1
kind: Pod
metadata:
 name: rbac-pod # => Pod name
spec:
 serviceAccountName:
 pod-reader-sa # => Use custom ServiceAccount
 # => Default: "default" ServiceAccount (every namespace has one)
 # => Token auto-mounted at /var/run/secrets/kubernetes.io/serviceaccount/
 # => Pod inherits ALL permissions from ServiceAccount
 containers:
 - name: kubectl
 image:
 bitnami/kubectl:latest # => Container with kubectl CLI
 # => Uses mounted ServiceAccount token automatically
 # => Token read from standard location
 command: ["sh", "-c"] # => Shell command executor
 args: # => Commands to test RBAC permissions
 - |
 echo "=== Testing RBAC Permissions ==="
 # => Test 1: List pods (ALLOWED)
 echo "Test 1: kubectl get pods"
 kubectl get pods
 # => SUCCESS: Role permits "list" verb on "pods" resource
 # => Returns: list of Pods in default namespace
 
 # => Test 2: Get single pod (ALLOWED)
 echo "Test 2: kubectl get pod rbac-pod"
 kubectl get pod rbac-pod
 # => SUCCESS: Role permits "get" verb on "pods" resource
 # => Returns: Pod details
 
 # => Test 3: List services (DENIED)
 echo "Test 3: kubectl get services"
 kubectl get services || echo "DENIED: services not in Role"
 # => FAILED: 403 Forbidden
 # => Role only permits pods, not services
 
 # => Test 4: Delete pod (DENIED)
 echo "Test 4: kubectl delete pod rbac-pod --dry-run=client"
 kubectl delete pod rbac-pod --dry-run=client || echo "DENIED: delete not permitted"
 # => FAILED: 403 Forbidden
 # => Role missing "delete" verb
 
 # => Test 5: Create pod (DENIED)
 echo "Test 5: kubectl run test --image=nginx --dry-run=client"
 kubectl run test --image=nginx --dry-run=client || echo "DENIED: create not permitted"
 # => FAILED: 403 Forbidden
 # => Role missing "create" verb
 
 sleep 3600 # => Keep Pod running for manual testing

Output (from container logs):

=== Testing RBAC Permissions ===
Test 1: kubectl get pods
NAME READY STATUS RESTARTS AGE
rbac-pod 1/1 Running 0 10s

Test 2: kubectl get pod rbac-pod
NAME READY STATUS RESTARTS AGE
rbac-pod 1/1 Running 0 10s

Test 3: kubectl get services
Error from server (Forbidden): services is forbidden: User "system:serviceaccount:default:pod-reader-sa" cannot list resource "services" in API group "" in the namespace "default"
DENIED: services not in Role

Test 4: kubectl delete pod rbac-pod --dry-run=client
Error from server (Forbidden): pods "rbac-pod" is forbidden: User "system:serviceaccount:default:pod-reader-sa" cannot delete resource "pods" in API group "" in the namespace "default"
DENIED: delete not permitted

Test 5: kubectl run test --image=nginx --dry-run=client
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:default:pod-reader-sa" cannot create resource "pods" in API group "" in the namespace "default"
DENIED: create not permitted

RBAC verification commands (from outside cluster):

# Test permissions as ServiceAccount
kubectl auth can-i list pods --as=system:serviceaccount:default:pod-reader-sa
# => yes (Role permits list on pods)
 
kubectl auth can-i get pods --as=system:serviceaccount:default:pod-reader-sa
# => yes (Role permits get on pods)
 
kubectl auth can-i list services --as=system:serviceaccount:default:pod-reader-sa
# => no (services not in Role rules)
 
kubectl auth can-i delete pods --as=system:serviceaccount:default:pod-reader-sa
# => no (delete verb not in Role)
 
kubectl auth can-i create pods --as=system:serviceaccount:default:pod-reader-sa
# => no (create verb not in Role)
 
# View effective permissions
kubectl describe role pod-reader
# => Shows: get, list, watch on pods
 
kubectl describe rolebinding read-pods-binding
# => Shows: pod-reader-sa bound to pod-reader Role

Key Takeaway: Use ServiceAccounts for Pod identity and RBAC for fine-grained permission control; follow principle of least privilege by granting only required permissions; prefer namespace-scoped Roles over cluster-wide ClusterRoles when possible; use kubectl auth can-i to verify permissions before deployment.

Why It Matters: RBAC is the foundation of Kubernetes security in production environments, enforcing least-privilege access at scale. Proper ServiceAccount configuration prevents privilege escalation attacks (where compromised Pods gain unauthorized cluster access) and enables compliance auditing (proving who can do what in the cluster). Without RBAC, a single vulnerable Pod could compromise the entire cluster — production clusters require RBAC as a mandatory control for SOC 2 and PCI-DSS compliance.

Example 59: ClusterRole and ClusterRoleBinding

ClusterRoles define cluster-wide permissions for non-namespaced resources (nodes, PersistentVolumes) or cross-namespace access. ClusterRoleBindings grant cluster-level permissions.

# ServiceAccount (namespace-scoped, used by cluster-wide ClusterRoleBinding)
apiVersion: v1
kind: ServiceAccount
metadata:
 name: cluster-reader-sa # => ServiceAccount name
 namespace:
 kube-system # => Placed in kube-system (system namespace)
 # => ServiceAccount is namespace-scoped
 # => Can be granted cluster-wide permissions via ClusterRoleBinding
 
---
# ClusterRole (cluster-scoped permissions definition)
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole # => Cluster-wide permission template
metadata:
 name:
 node-reader # => ClusterRole name (cluster-scoped)
 # => NO namespace field (cluster-wide resource)
 # => Can be bound cluster-wide OR per-namespace
 # => Reusable across multiple bindings
rules: # => Permission rules (OR relationship between rules)
 # Rule 1: Node access (cluster-scoped resources)
 - apiGroups:
 [""] # => Core API group (v1)
 # => Empty string = core Kubernetes API
 # => Nodes in core API group
 resources:
 ["nodes"] # => Nodes are cluster-scoped
 # => No namespace concept for nodes
 # => One node serves multiple namespaces
 # => Subresources: nodes/status, nodes/proxy
 verbs:
 ["get", "list", "watch"] # => Read-only operations
 # => get: fetch single node by name
 # => list: list all cluster nodes
 # => watch: stream node changes
 # => MISSING: create, delete, update, patch
 # => Prevents node manipulation
 
 # Rule 2: PersistentVolume access (cluster-scoped resources)
 - apiGroups:
 [""] # => Core API group
 # => PersistentVolumes in core API
 resources:
 ["persistentvolumes"] # => PVs are cluster-scoped
 # => Different from PVCs (namespace-scoped)
 # => PVs can be claimed from any namespace
 verbs:
 ["get", "list"] # => Read-only (no watch)
 # => Sufficient for inventory/monitoring
 # => watch not needed for PVs (change less frequently)
 
 # Rule 3: Cross-namespace Pod access (demonstrating namespace resources)
 - apiGroups: [""] # => Core API group
 resources:
 ["pods"] # => Pods are namespace-scoped
 # => ClusterRole + ClusterRoleBinding = access ALL namespaces
 # => ClusterRole + RoleBinding = access SINGLE namespace
 verbs:
 ["get", "list", "watch"] # => Read Pods across all namespaces
 # => Enables cluster-wide monitoring
 
---
# ClusterRoleBinding (grants cluster-wide permissions)
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding # => Grants ClusterRole globally
metadata:
 name:
 read-nodes-global # => ClusterRoleBinding name (cluster-scoped)
 # => NO namespace field
 # => Effective across entire cluster
subjects: # => WHO gets permissions (can be multiple)
 - kind:
 ServiceAccount # => Subject type
 # => ServiceAccount is namespace-scoped
 # => Must specify namespace
 name:
 cluster-reader-sa # => ServiceAccount name
 # => This ServiceAccount gains cluster-wide permissions
 namespace:
 kube-system # => ServiceAccount location
 # => ServiceAccount defined in kube-system namespace
 # => Can reference ServiceAccounts from any namespace
roleRef: # => WHAT permissions to grant (immutable)
 kind:
 ClusterRole # => References ClusterRole (cluster-scoped)
 # => CANNOT reference Role (incompatible scopes)
 # => Role is namespace-scoped, won't work here
 name:
 node-reader # => ClusterRole name
 # => ClusterRole must exist
 # => Grants ALL rules from node-reader ClusterRole
 apiGroup:
 rbac.authorization.k8s.io # => RBAC API group
 # => Required for validation

ClusterRole binding patterns:

# Pattern 1: ClusterRole + ClusterRoleBinding = cluster-wide access
# => ServiceAccount can access resources in ALL namespaces
 
# Pattern 2: ClusterRole + RoleBinding = namespace-specific access
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding # => RoleBinding (namespace-scoped)
metadata:
 name: read-nodes-in-namespace # => Binding name
 namespace: production # => Effective only in production namespace
subjects:
 - kind: ServiceAccount
 name: limited-reader-sa # => ServiceAccount in production namespace
 namespace: production
roleRef:
 kind:
 ClusterRole # => References ClusterRole
 # => But binding is namespace-scoped
 name:
 node-reader # => Reuses ClusterRole definition
 # => Permissions limited to production namespace
 apiGroup: rbac.authorization.k8s.io
# => Result: limited-reader-sa can ONLY access Pods in production namespace

Verification commands:

# Test cluster-wide permissions
kubectl auth can-i list nodes --as=system:serviceaccount:kube-system:cluster-reader-sa
# => yes (ClusterRole permits list on nodes cluster-wide)
 
kubectl auth can-i list persistentvolumes --as=system:serviceaccount:kube-system:cluster-reader-sa
# => yes (ClusterRole permits list on PVs)
 
kubectl auth can-i list pods --all-namespaces --as=system:serviceaccount:kube-system:cluster-reader-sa
# => yes (ClusterRole + ClusterRoleBinding grants cross-namespace access)
 
kubectl auth can-i delete nodes --as=system:serviceaccount:kube-system:cluster-reader-sa
# => no (delete verb not in ClusterRole)
 
# View ClusterRole details
kubectl describe clusterrole node-reader
# => Shows: nodes (get, list, watch), persistentvolumes (get, list), pods (get, list, watch)
 
kubectl describe clusterrolebinding read-nodes-global
# => Shows: cluster-reader-sa bound to node-reader ClusterRole globally

Common ClusterRole use cases:

# Use Case 1: Monitoring (read-only across all namespaces)
# => Prometheus, Datadog, New Relic
 
# Use Case 2: Node management
# => Node auto-scalers (Cluster Autoscaler)
 
# Use Case 3: PersistentVolume administration
# => Storage operators (Rook, OpenEBS)
 
# Use Case 4: Cross-namespace operators
# => cert-manager, ingress-nginx, ArgoCD

Key Takeaway: Use ClusterRoles for cluster-wide resources like nodes and PersistentVolumes or cross-namespace access; combine ClusterRole with RoleBinding to limit cluster-wide permissions to specific namespaces; ClusterRole + ClusterRoleBinding grants cluster-wide access while ClusterRole + RoleBinding limits to single namespace.

Why It Matters: ClusterRoles enable platform teams to build monitoring solutions that observe all namespaces without requiring per-namespace RBAC configuration, reducing operational overhead from hundreds of RoleBindings to a single ClusterRoleBinding. This pattern is critical for cluster-wide operators (cert-manager, ingress-nginx) that manage resources across namespaces, and for SRE teams who need read-only cluster visibility for troubleshooting without modifying workloads. Cross-namespace visibility reduces mean-time-to-recovery (MTTR) in production incidents.

Example 60: Aggregated ClusterRoles

Aggregated ClusterRoles combine permissions from multiple ClusterRoles using label selectors, enabling modular permission management and extending built-in roles.

# Parent ClusterRole (aggregates permissions from multiple child ClusterRoles)
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
 name:
 monitoring-aggregated # => Parent ClusterRole (container)
 # => Rules auto-populated by aggregation controller
 # => DO NOT manually edit rules field
 labels:
 rbac.example.com/aggregate-to-monitoring:
 "true" # => Self-include label
 # => Optional but conventional
 # => Matches own aggregationRule selector
aggregationRule: # => Defines aggregation behavior
 clusterRoleSelectors: # => Label selectors (OR relationship)
 - matchLabels: # => Matches ClusterRoles with these labels
 rbac.example.com/aggregate-to-monitoring:
 "true" # => Custom label key
 # => Matches child ClusterRoles below
 # => Controller watches for ClusterRoles with this label
 # => Auto-updates when matching ClusterRoles change
rules: [] # => Auto-populated by RBAC aggregation controller
 # => DO NOT manually add rules here
 
---
# Child ClusterRole 1 (defines Pod permissions)
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
 name:
 monitoring-pods # => Component ClusterRole
 # => Provides Pod-related permissions
 labels:
 rbac.example.com/aggregate-to-monitoring:
 "true" # => Aggregation label
 # => Triggers inclusion in monitoring-aggregated
 # => Adding this label = instant aggregation
 # => Removing this label = instant removal
rules: # => These rules copied to monitoring-aggregated
 - apiGroups:
 [""] # => Core API group (v1)
 # => Pods in core API
 resources:
 - pods # => Pod resource
 # => Enables listing/getting Pods
 - pods/log # => Pod logs subresource
 # => Required for kubectl logs
 # => Separate permission from pods
 - pods/status # => Pod status subresource
 # => Read Pod status without full Pod access
 verbs:
 ["get", "list", "watch"] # => Read-only operations
 # => get: fetch single Pod
 # => list: list all Pods
 # => watch: stream Pod changes
 # => Typical monitoring permissions
 
---
# Child ClusterRole 2 (defines Metrics permissions)
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
 name:
 monitoring-metrics # => Component ClusterRole
 # => Provides metrics-related permissions
 labels:
 rbac.example.com/aggregate-to-monitoring:
 "true" # => Aggregation label
 # => Same label = included in same parent
 # => Can have multiple parents with different labels
rules: # => These rules also copied to monitoring-aggregated
 - apiGroups:
 ["metrics.k8s.io"] # => Metrics API group
 # => Requires metrics-server installed in cluster
 # => Not available by default in some clusters
 resources:
 - pods # => Pod metrics (CPU, memory usage)
 # => Different from core API pods
 # => Used by kubectl top pods
 - nodes # => Node metrics (CPU, memory, disk)
 # => Used by kubectl top nodes
 # => Aggregated metrics from kubelet
 verbs:
 ["get", "list"] # => Read metrics
 # => get: fetch metrics for specific Pod/Node
 # => list: fetch metrics for all Pods/Nodes
 # => NO watch (metrics are point-in-time snapshots)
 
---
# Child ClusterRole 3 (defines Service permissions - optional extension)
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
 name:
 monitoring-services # => Additional component ClusterRole
 # => Demonstrates extensibility
 labels:
 rbac.example.com/aggregate-to-monitoring:
 "true" # => Same label = aggregated
 # => Add anytime to extend parent
rules:
 - apiGroups: [""] # => Core API group
 resources:
 - services # => Service resource
 # => Monitor service availability
 - endpoints # => Endpoints resource
 # => Monitor service endpoints
 verbs:
 ["get", "list", "watch"] # => Read-only monitoring
 # => Typical for service discovery

Aggregation result (monitoring-aggregated final permissions):

# Effective rules in monitoring-aggregated (auto-populated):
# rules:
# - apiGroups: [""]
# resources: ["pods", "pods/log", "pods/status", "services", "endpoints"]
# verbs: ["get", "list", "watch"]
# - apiGroups: ["metrics.k8s.io"]
# resources: ["pods", "nodes"]
# verbs: ["get", "list"]
# => Combined from monitoring-pods + monitoring-metrics + monitoring-services

How to extend aggregated ClusterRole:

# Step 1: Create new child ClusterRole with matching label
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
 name: monitoring-configmaps # => New extension
 labels:
 rbac.example.com/aggregate-to-monitoring: "true" # => Use same label
rules:
 - apiGroups: [""]
 resources: ["configmaps"]
 verbs: ["get", "list"]
 
# Step 2: Apply child ClusterRole
# => kubectl apply -f monitoring-configmaps.yaml

Built-in aggregated ClusterRoles (Kubernetes defaults):

# admin role (namespace admin permissions)
# => Aggregation label: rbac.authorization.k8s.io/aggregate-to-admin: "true"
 
# edit role (namespace edit permissions)
# => Aggregation label: rbac.authorization.k8s.io/aggregate-to-edit: "true"
 
# view role (namespace read-only permissions)
# => Aggregation label: rbac.authorization.k8s.io/aggregate-to-view: "true"
 
# Example: Extend built-in view role with custom CRD
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
 name: view-databases-crd # => Extend view role
 labels:
 rbac.authorization.k8s.io/aggregate-to-view: "true" # => Built-in label
rules:
 - apiGroups: ["example.com"]
 resources: ["databases"] # => Custom resource
 verbs: ["get", "list", "watch"] # => Read-only
# => All users with view role now see databases CRD

Verification:

# View aggregated ClusterRole (before children created)
kubectl get clusterrole monitoring-aggregated -o yaml
# => rules: [] (empty)
 
# Create child ClusterRoles
kubectl apply -f monitoring-pods.yaml
kubectl apply -f monitoring-metrics.yaml
 
# View aggregated ClusterRole (after children created)
kubectl get clusterrole monitoring-aggregated -o yaml
# => rules: [pods, pods/log, pods/status (get/list/watch), metrics.k8s.io/pods, nodes (get/list)]
 
# Test aggregation by adding new child
kubectl label clusterrole some-existing-role rbac.example.com/aggregate-to-monitoring=true
# => some-existing-role rules immediately merged into monitoring-aggregated
 
# Test aggregation by removing label
kubectl label clusterrole monitoring-services rbac.example.com/aggregate-to-monitoring-
# => monitoring-services rules immediately removed from monitoring-aggregated

Key Takeaway: Use aggregated ClusterRoles for modular permission management; add new permissions by creating ClusterRoles with matching labels; built-in roles like admin, edit, view use aggregation for extensibility; aggregation enables zero-downtime permission updates without modifying parent ClusterRole or existing bindings.

Why It Matters: Aggregated ClusterRoles power Kubernetes' extensibility model, allowing Custom Resource Definitions (CRDs) to automatically extend built-in roles (admin, edit, view) without manual RBAC updates. This pattern is used by Istio, Knative, and ArgoCD to ensure their custom resources integrate seamlessly with existing RBAC policies. For platform teams building internal developer platforms, aggregation enables adding new capabilities (custom metrics, domain-specific resources) while maintaining consistent permission structures across hundreds of microservices, reducing RBAC configuration drift and security policy gaps.

Example 61: SecurityContext and Pod Security

SecurityContext controls security settings at Pod and container level, including user/group IDs, privilege escalation, filesystem access, and capabilities.

apiVersion: v1 # => Core Kubernetes API
kind: Pod # => Pod with security constraints
metadata: # => Pod metadata
 name: security-context-pod # => Pod name
spec: # => Pod specification
 securityContext: # => Pod-level security settings
 runAsUser:
 1000 # => Run as user ID 1000 (non-root)
 # => Default: container image user (often root)
 runAsGroup: # => Primary group ID for container process
 3000 # => Run as group ID 3000
 # => Primary group for process
 fsGroup:
 2000 # => Filesystem group for volumes
 # => Files owned by group 2000
 # => Enables volume sharing between containers
 fsGroupChangePolicy:
 OnRootMismatch
 # => Change ownership only if needed
 # => Alternative: Always (slower but thorough)
 containers: # => Container list
 - name: secure-app # => Container name
 image:
 nginx:1.24 # => Container image
 # => Image may define default user
 securityContext: # => Container-level (overrides Pod-level)
 allowPrivilegeEscalation:
 false
 # => Prevents gaining more privileges
 # => Blocks setuid binaries
 runAsNonRoot:
 true # => Ensures non-root user
 # => Container fails if user is root
 # => Validates runAsUser setting
 readOnlyRootFilesystem:
 true # => Filesystem is read-only
 # => Use volumes for writable paths
 # => Prevents malware persistence
 capabilities:
 drop:
 - ALL # => Drop all Linux capabilities
 # => Start with zero privileges
 add:
 - NET_BIND_SERVICE # => Add only required capabilities
 # => Bind to ports < 1024
 # => Minimal privilege for nginx
 
 volumeMounts: # => Volume mount list
 - name:
 cache # => Volume mount name
 # => Writable because readOnlyRootFilesystem=true
 mountPath:
 /var/cache/nginx # => Writable volume for cache
 # => nginx needs writable cache directory
 
 volumes: # => Volume definitions
 - name: cache # => Cache volume name
 emptyDir:
 {} # => Ephemeral writable storage
 # => Deleted when Pod deleted
 # => Safe for cache/temp files
 
 
# Security best practices:
# => runAsNonRoot: true (prevent root exploits)

Key Takeaway: Always run containers as non-root with read-only root filesystem; drop all capabilities and add only required ones; use SecurityContext to enforce defense-in-depth security practices.

Why It Matters: SecurityContext prevents container escape vulnerabilities that have affected Docker, containerd, and runc (CVE-2019-5736). Running as non-root with read-only filesystem eliminates the majority of container attack vectors by denying privilege escalation paths. This defense-in-depth approach is mandatory for PCI-DSS and SOC 2 compliance frameworks, and reduces container escape risk by ensuring processes operate with the minimum privileges required for their function.

Example 62: PodSecurityPolicy (Deprecated)

PodSecurityPolicy (PSP) enforces cluster-wide security standards for Pods. PSP is deprecated in Kubernetes 1.21+ and removed in 1.25+, replaced by Pod Security Admission.

# NOTE: PSP deprecated, use Pod Security Standards instead
# Shown for historical context and migration understanding
 
apiVersion: policy/v1beta1 # => Deprecated policy API (removed in k8s 1.25)
kind: PodSecurityPolicy # => PSP resource (legacy; use Pod Security Standards)
metadata: # => PSP metadata
 name: restricted # => PSP name
spec: # => Security policy specification
 privileged: false # => Disallow privileged containers
 allowPrivilegeEscalation: false # => Prevent privilege escalation
 requiredDropCapabilities: # => Capabilities that must be dropped
 - ALL # => Require dropping all capabilities
 volumes: # => Allowed volume types
 - configMap # => ConfigMap volumes allowed
 - emptyDir # => emptyDir (ephemeral) volumes allowed
 - projected # => Projected (composite) volumes allowed
 - secret # => Secret volumes allowed
 - downwardAPI # => Downward API volumes allowed
 - persistentVolumeClaim # => PVC-backed volumes allowed
 runAsUser: # => User ID policy
 rule: MustRunAsNonRoot # => Enforce non-root user
 seLinux: # => SELinux policy
 rule: RunAsAny # => Allow any SELinux context
 fsGroup: # => Filesystem group policy
 rule: RunAsAny # => Allow any fsGroup
 readOnlyRootFilesystem: true # => Require read-only root filesystem
 
---
# Migration to Pod Security Standards (PSS)
# 1. Label namespaces with Pod Security levels
# => kubectl label namespace default pod-security.kubernetes.io/enforce=restricted
# => Levels: privileged, baseline, restricted
 
# 2. Pod Security Admission enforces policies
# => No PSP or RBAC setup needed
# => Built-in controller validates Pods
 
# 3. Restricted level enforces:
# => runAsNonRoot: true
# => allowPrivilegeEscalation: false
# => Drop ALL capabilities
# => Read-only root filesystem
 
# Check current PSS enforcement on namespace:
kubectl get namespace default -o jsonpath='{.metadata.labels}'
# => {"pod-security.kubernetes.io/enforce":"restricted"}
 
# Test PSS enforcement (should fail with restricted policy):
kubectl run test --image=nginx --restart=Never
# => Error: pods "test" is forbidden: violates PodSecurity "restricted:latest"

Key Takeaway: Migrate from deprecated PodSecurityPolicy to Pod Security Standards by labeling namespaces with enforcement levels (privileged, baseline, restricted); Pod Security Admission provides simpler cluster-wide security enforcement.

Why It Matters: PodSecurityPolicy's complexity led to widespread misconfigurations and security gaps (misconfigured PSPs at Capital One contributed to their 2019 breach). Pod Security Standards (PSS) simplify security enforcement with three predefined profiles (privileged, baseline, restricted) that are easier to audit and maintain. Companies like Red Hat and VMware migrated their OpenShift and Tanzu platforms to PSS before the 1.25 removal deadline. This migration is mandatory for Kubernetes 1.25+, affecting all production clusters running workloads that previously relied on PodSecurityPolicy for security enforcement.

Example 63: Secrets Encryption at Rest

Kubernetes Secrets are base64-encoded by default (not encrypted). Enable encryption at rest using EncryptionConfiguration to protect sensitive data in etcd.

# File: /etc/kubernetes/encryption-config.yaml
# Location: Control plane node (where kube-apiserver runs)
# Purpose: Encrypt Secrets and ConfigMaps in etcd at rest
 
apiVersion: apiserver.config.k8s.io/v1 # => EncryptionConfiguration API
kind: EncryptionConfiguration # => Encryption settings for kube-apiserver
resources: # => Resource types to encrypt
 - resources: # => First resource group
 - secrets # => Encrypt Secret objects
 # => Contains passwords, tokens, certificates
 # => Base64-encoded by default (NOT encrypted)
 - configmaps # => Optionally encrypt ConfigMaps
 # => Usually not sensitive, but can contain credentials
 # => Encrypting increases etcd overhead
 providers: # => Encryption providers (tried in order)
 # => First provider encrypts new data
 
 # Primary encryption provider (AES-CBC)
 - aescbc: # => AES-CBC with PKCS#7 padding
 # => Industry-standard symmetric encryption
 keys: # => Encryption keys (can have multiple for rotation)
 - name:
 key1 # => Key identifier (arbitrary name)
 # => Used for key rotation tracking
 # => Appears in etcd metadata
 secret:
 <base64-encoded-32-byte-key> # => 256-bit encryption key
 # => MUST be exactly 32 bytes before base64 encoding
 # => Generate: head -c 32 /dev/urandom | base64
 # => Example: ySPsowMmLdOJJHvQ7g5G2YQffQhBdVxfRPEADqpYZ9s=
 
 # Additional key for rotation (optional)
 - name:
 key2 # => Older key kept for decryption
 # => Data encrypted with key1, but key2 still decrypts old data
 # => Remove after all data re-encrypted with key1
 secret:
 <base64-encoded-old-key> # => Previous encryption key
 # => Enables zero-downtime key rotation
 
 # Fallback provider (no encryption)
 - identity: {} # => No-op encryption provider
 # => Stores data in plaintext

Setup steps (apply encryption to cluster):

# Step 1: Generate encryption key (256-bit = 32 bytes)
head -c 32 /dev/urandom | base64
# => Output: ySPsowMmLdOJJHvQ7g5G2YQffQhBdVxfRPEADqpYZ9s=
 
# Step 2: Create encryption config file on control plane node
sudo cat > /etc/kubernetes/encryption-config.yaml <<EOF
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
 - resources:
 - secrets
 providers:
 - aescbc:
 keys:
 - name: key1
 secret: ySPsowMmLdOJJHvQ7g5G2YQffQhBdVxfRPEADqpYZ9s=
 - identity: {}
EOF
# => File created at /etc/kubernetes/encryption-config.yaml
 
# Step 3: Update kube-apiserver configuration
sudo vi /etc/kubernetes/manifests/kube-apiserver.yaml
# => Add flag: --encryption-provider-config=/etc/kubernetes/encryption-config.yaml
 
# Step 4: Verify kube-apiserver restarted
kubectl get pods -n kube-system | grep kube-apiserver
# => kube-apiserver-controlplane 1/1 Running (recent restart time)
 
# Step 5: Encrypt ALL existing Secrets (re-write to trigger encryption)
kubectl get secrets --all-namespaces -o json | kubectl replace -f -
# => Reads all Secrets from etcd (decrypted if encrypted with old key)
 
# Step 6: Verify encryption (check etcd directly)
ETCDCTL_API=3 etcdctl get /registry/secrets/default/my-secret --print-value-only
# => Before encryption: plaintext "password123" visible

Alternative encryption providers:

# Option 1: AES-GCM (faster than AES-CBC, requires unique nonce)
providers:
 - aesgcm: # => AES-GCM authenticated encryption
 # => Faster than AES-CBC (hardware acceleration)
 keys:
 - name: key1
 secret: <base64-encoded-32-byte-key> # => 256-bit key
 - identity: {}
 
# Option 2: KMS provider (AWS KMS - enterprise grade)
providers:
 - kms: # => External Key Management Service
 # => Keys stored outside cluster (AWS KMS, GCP KMS, Azure Key Vault)
 name: aws-encryption-provider # => KMS provider name
 endpoint: unix:///var/run/kmsplugin/socket.sock # => KMS plugin socket
 # => Requires aws-encryption-provider daemonset
 cachesize: 1000 # => Cache decrypted keys (performance)
 # => Reduces KMS API calls
 timeout: 3s # => KMS request timeout
 # => Prevents kube-apiserver hang if KMS unavailable
 - identity: {} # => Fallback if KMS unavailable (disaster recovery)
 
# Option 3: Secretbox (NaCl library)
providers:
 - secretbox: # => XSalsa20-Poly1305 encryption
 # => Modern authenticated encryption
 keys:
 - name: key1
 secret: <base64-encoded-32-byte-key>
 - identity: {}

Key rotation procedure (zero-downtime):

# Step 1: Add new key to encryption config (keep old key)
providers:
 - aescbc:
 keys:
 - name: key2 # => New key (encrypts new data)
 secret: <base64-encoded-new-key>
 - name: key1 # => Old key (decrypts existing data)
 secret: <base64-encoded-old-key>
 - identity: {}
# => New Secrets encrypted with key2
 
# Step 2: Apply config and restart kube-apiserver
# => kubectl apply and wait for restart
 
# Step 3: Re-encrypt all Secrets with new key
kubectl get secrets --all-namespaces -o json | kubectl replace -f -
# => All Secrets now encrypted with key2
 
# Step 4: Remove old key from config
providers:
 - aescbc:
 keys:
 - name: key2 # => Only new key remains
 secret: <base64-encoded-new-key>
 - identity: {}
# => Old key removed safely (all data re-encrypted)

Verification:

# Check encryption config loaded
kubectl get --raw /api/v1 | jq '.encryption.providers'
# => Shows: aescbc, identity
 
# Test encryption (create Secret, check etcd)
kubectl create secret generic test-encryption --from-literal=key=value
 
# Read from etcd directly (encrypted)
ETCDCTL_API=3 etcdctl get /registry/secrets/default/test-encryption --print-value-only | hexdump -C
# => Binary encrypted data (k8s:enc:aescbc:v1:key1: prefix)
 
# Read via kubectl (automatically decrypted)
kubectl get secret test-encryption -o jsonpath='{.data.key}' | base64 -d
# => Output: value (decrypted)

Key Takeaway: Enable encryption at rest for production clusters storing sensitive data; use KMS providers (AWS KMS, GCP KMS) for enterprise key management and audit logging; rotate encryption keys periodically following security policies; always keep identity provider as fallback for disaster recovery.

Why It Matters: Encryption at rest protects against etcd backup theft and unauthorized database access (etcd contains all cluster secrets including database passwords, API tokens, and TLS certificates). Unencrypted etcd data represents a single point of total credential compromise — one stolen snapshot yields all production secrets in plaintext. KMS-backed encryption (AWS KMS, HashiCorp Vault) provides automated key rotation and audit logging required for PCI-DSS compliance, ensuring that even if physical media or backups are stolen, the data remains unreadable without the managed key material.

Network Policies (Examples 64-68)

Example 64: Basic Network Policy

NetworkPolicies control pod-to-pod traffic using label selectors and rules. By default, all traffic is allowed; NetworkPolicies enforce restrictions.

%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
%% NetworkPolicy traffic control
graph TD
 A[Pod: frontend<br/>app=frontend] -->|Allowed| B[Pod: backend<br/>app=backend]
 C[Pod: external<br/>app=external] -->|Denied| B
 D[Pod: database<br/>app=database] -->|Allowed| B
 
 style A fill:#0173B2,stroke:#000000,stroke-width:2px,color:#fff
 style B fill:#029E73,stroke:#000000,stroke-width:2px,color:#fff
 style C fill:#CC78BC,stroke:#000000,stroke-width:2px,color:#fff
 style D fill:#DE8F05,stroke:#000000,stroke-width:2px,color:#fff

# NetworkPolicy: Pod-level firewall rules
apiVersion: networking.k8s.io/v1 # => NetworkPolicy API
kind: NetworkPolicy # => Firewall rules for Pods
metadata:
 name: backend-network-policy # => Policy name
 namespace:
 default # => Namespace-scoped (applies within this namespace only)
 # => NetworkPolicies don't cross namespace boundaries
 # => Need separate policies per namespace
spec:
 podSelector: # => Which Pods this policy applies to
 matchLabels:
 app:
 backend # => Selects Pods with label app=backend
 # => Policy applies to ALL Pods matching this selector
 # => Empty selector {} = all Pods in namespace
 # => Multiple labels = AND relationship
 
 policyTypes: # => Which traffic directions to control
 - Ingress # => Controls INCOMING traffic to selected Pods
 # => Default: allow all if not specified
 # => With Ingress: deny all, then explicit allow
 - Egress # => Controls OUTGOING traffic from selected Pods
 # => Default: allow all if not specified
 # => With Egress: deny all, then explicit allow
 
 ingress: # => Ingress rules (who can connect TO backend Pods)
 - from: # => Traffic sources (OR relationship between selectors)
 # Source 1: Frontend Pods
 - podSelector: # => Select Pods by labels (same namespace)
 matchLabels:
 app:
 frontend # => Allow traffic FROM Pods with app=frontend
 # => Pods in same namespace
 # => No namespaceSelector = same namespace only
 
 # Source 2: Database Pods
 - podSelector:
 matchLabels:
 app:
 database # => Allow traffic FROM Pods with app=database
 # => OR relationship with frontend selector above
 # => Either frontend OR database can connect
 
 ports: # => Allowed destination ports (applies to ALL from sources)
 - protocol:
 TCP # => Protocol (TCP, UDP, SCTP)
 # => Required field
 port:
 8080 # => Destination port on backend Pods
 # => Backend must listen on this port
 # => Source port unrestricted
 # => Result: frontend OR database → backend:8080/TCP allowed
 # => All other ingress traffic denied (default deny)
 
 egress: # => Egress rules (where backend Pods can connect TO)
 - to: # => Traffic destinations
 - podSelector:
 matchLabels:
 app:
 database # => Allow traffic TO Pods with app=database
 # => Backend can connect to database
 # => Same namespace only
 
 ports: # => Allowed destination ports
 - protocol: TCP
 port:
 5432 # => PostgreSQL default port
 # => Backend can connect to database:5432
 # => Result: backend → database:5432/TCP allowed
 # => All other egress traffic denied (including DNS!)
 
 # IMPORTANT: DNS egress required for service discovery
 - to: # => Additional egress rule for DNS
 - namespaceSelector: # => Select namespaces by labels
 matchLabels:
 name:
 kube-system # => kube-system namespace
 # => Where kube-dns/CoreDNS runs
 - podSelector:
 matchLabels:
 k8s-app: kube-dns # => Select DNS Pods
 ports: # => UDP port 53 for DNS
 - protocol: UDP # => UDP protocol (DNS uses UDP)
 port:
 53 # => DNS port
 # => Required for service name resolution
 # => Without this, backend can't resolve service names

Traffic flow summary:

# Allowed ingress (TO backend Pods):
# => frontend Pods → backend:8080/TCP ✓
 
# Allowed egress (FROM backend Pods):
# => backend → database:5432/TCP ✓

Testing the NetworkPolicy:

# Create test Pods with labels
kubectl run frontend --image=busybox --labels="app=frontend" -- sleep 3600
kubectl run backend --image=nginx --labels="app=backend" --port=8080
kubectl run database --image=postgres:15 --labels="app=database" --port=5432
kubectl run external --image=busybox --labels="app=external" -- sleep 3600
 
# Apply NetworkPolicy
kubectl apply -f backend-network-policy.yaml
# => NetworkPolicy created
 
# Test 1: frontend → backend:8080 (ALLOWED)
kubectl exec frontend -- wget -O- --timeout=2 backend:8080
# => SUCCESS: HTML response from nginx
 
# Test 2: external → backend:8080 (DENIED)
kubectl exec external -- wget -O- --timeout=2 backend:8080
# => TIMEOUT: no response
 
# Test 3: backend → database:5432 (ALLOWED)
kubectl exec backend -- nc -zv database 5432
# => SUCCESS: connection established
 
# Test 4: backend → external (DENIED if no rule)
kubectl exec backend -- wget -O- --timeout=2 external:8080
# => TIMEOUT: no response
 
# Verify NetworkPolicy applied
kubectl describe networkpolicy backend-network-policy
# => Shows: podSelector, ingress rules, egress rules
 
kubectl get pods -l app=backend -o wide
# => Shows backend Pods subject to NetworkPolicy

NetworkPolicy requirements:

# Requirement 1: CNI plugin with NetworkPolicy support
# => Supported: Calico, Cilium, Weave Net, Antrea, Romana
 
# Requirement 2: Pods must have labels
# => NetworkPolicy selects Pods by labels
 
# Requirement 3: Namespace-scoped
# => NetworkPolicies don't cross namespaces
 
# Behavior notes:
# => Without NetworkPolicy: ALL traffic allowed (permissive)

Key Takeaway: NetworkPolicies implement pod-to-pod firewall rules using label selectors; start with deny-all policy and add explicit allow rules; requires CNI plugin with NetworkPolicy support (Calico, Cilium, Weave); remember to allow DNS egress for service discovery.

Why It Matters: NetworkPolicies implement zero-trust networking required by PCI-DSS, HIPAA, and SOC 2 compliance frameworks. Lateral movement attacks demonstrate the risk of flat cluster networks — NetworkPolicies prevent compromised frontend Pods from accessing backend databases directly. This pattern enforces payment card industry segmentation (isolating payment processing from other services), reducing breach impact radius. Without NetworkPolicies, any Pod compromise can pivot to all cluster resources, violating defense-in-depth principles.

Example 65: Default Deny Network Policy

Default deny NetworkPolicies block all traffic to/from Pods, requiring explicit allow rules. This zero-trust approach improves security by denying unexpected traffic.

%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
%% Default deny with explicit allow
graph TD
 A[NetworkPolicy: Deny All] --> B[All Pods isolated]
 B --> C[Add allow rule<br/>frontend → backend]
 C --> D{Traffic allowed?}
 D -->|frontend → backend| E[Allowed]
 D -->|external → backend| F[Denied]
 D -->|frontend → database| F
 
 style A fill:#CC78BC,stroke:#000000,stroke-width:2px,color:#fff
 style C fill:#DE8F05,stroke:#000000,stroke-width:2px,color:#fff
 style E fill:#029E73,stroke:#000000,stroke-width:2px,color:#fff
 style F fill:#CC78BC,stroke:#000000,stroke-width:2px,color:#fff

# Pattern 1: Deny all ingress traffic (allow egress)
apiVersion: networking.k8s.io/v1 # => NetworkPolicy API (stable)
kind: NetworkPolicy # => Pod-level firewall
metadata: # => NetworkPolicy metadata
 name: deny-all-ingress # => Policy name
 namespace: default # => Applies to default namespace only
spec: # => NetworkPolicy specification
 podSelector: {} # => Empty selector = ALL Pods in namespace
 # => Every Pod in default namespace affected
 policyTypes: # => Policy type declarations
 - Ingress # => Declares Ingress policy (default deny)
 # => NO ingress rules = deny all ingress
 # => Egress NOT declared = allow all egress (default)
 # ingress: [] field omitted = no ingress allowed
 # => Result: No Pod can receive traffic
 # => Exception: Pods can still connect OUT (egress allowed)
 
---
# => Document separator: second NetworkPolicy (Pattern 2: Deny all egress)
# Pattern 2: Deny all egress traffic (allow ingress)
apiVersion: networking.k8s.io/v1 # => NetworkPolicy API
kind: NetworkPolicy # => Egress-only denial policy
metadata: # => NetworkPolicy metadata
 name: deny-all-egress # => Policy name
 namespace: default # => Namespace scope
spec: # => NetworkPolicy specification
 podSelector: {} # => All Pods in namespace
 policyTypes: # => Policy type declarations
 - Egress # => Declares Egress policy (default deny)
 # => NO egress rules = deny all egress
 # => Ingress NOT declared = allow all ingress (default)
 # egress: [] field omitted = no egress allowed
 # => Result: No Pod can send traffic
 # => Exception: Pods can still receive traffic (ingress allowed)
 # => WARNING: Breaks DNS (Pods can't resolve service names)
 
---
# => Document separator: third NetworkPolicy (Pattern 3: complete isolation)
# Pattern 3: Deny all ingress AND egress (complete isolation)
apiVersion: networking.k8s.io/v1 # => NetworkPolicy API
kind: NetworkPolicy # => Complete isolation policy
metadata: # => NetworkPolicy metadata
 name: deny-all # => Policy name (both directions denied)
 namespace: default # => Namespace scope
spec: # => NetworkPolicy specification
 podSelector: {} # => All Pods in namespace
 policyTypes: # => Policy type declarations
 - Ingress # => Default deny ingress
 - Egress # => Default deny egress
 # NO ingress or egress rules = total isolation
 # => Result: Pods cannot send OR receive traffic
 # => Use case: emergency lockdown, compliance isolation
 # => WARNING: Breaks everything including DNS, health checks
 
---
# => Document separator: fourth NetworkPolicy (Pattern 4: deny-all with DNS preserved)
# Pattern 4: Default deny with DNS exception (recommended)
apiVersion: networking.k8s.io/v1 # => NetworkPolicy API
kind: NetworkPolicy # => Recommended production default deny
metadata: # => NetworkPolicy metadata
 name: deny-all-with-dns # => Recommended default deny (preserves DNS)
 namespace: default # => Apply this to every production namespace
spec: # => NetworkPolicy specification
 podSelector: {} # => All Pods
 policyTypes: # => Policy type declarations
 - Ingress # => Deny all ingress
 - Egress # => Deny all egress (except DNS below)
 egress: # => Egress rules (exceptions to default deny)
 # Allow DNS lookups (required for service discovery)
 - to: # => Destination selector for DNS exception
 - namespaceSelector: # => Select kube-system namespace
 matchLabels:
 kubernetes.io/metadata.name: kube-system # => Standard namespace label
 ports:
 - protocol: UDP
 port:
 53 # => DNS port
 # => kube-dns/CoreDNS in kube-system
 # Allow HTTPS to Kubernetes API (kubelet health checks)
 - to: # => Destination: any namespace
 - namespaceSelector: {} # => Any namespace
 ports: # => TCP port 443
 - protocol: TCP # => TCP protocol (HTTPS)
 port: 443 # => HTTPS for kubelet → API server
 # => Result: Deny all EXCEPT DNS and API access

Deployment workflow (safe default deny rollout):

# Step 1: Apply to test namespace
kubectl create namespace test-isolation                                           # => Creates isolated test namespace
kubectl label namespace test-isolation kubernetes.io/metadata.name=test-isolation # => Adds standard name label (required for namespaceSelector)
kubectl apply -f deny-all-with-dns.yaml -n test-isolation
# => Test namespace isolated (all traffic blocked except DNS)
 
# Step 2: Deploy test Pod
kubectl run test-app -n test-isolation --image=nginx
# => Pod created but network isolated (no traffic in or out except DNS)
 
# Step 3: Verify DNS works
kubectl exec -n test-isolation test-app -- nslookup kubernetes.default
# => Output: Server: 10.96.0.10  Name: kubernetes.default  Address: 10.96.0.1
# => SUCCESS (DNS allowed via egress rule to kube-system port 53)
 
# Step 4: Verify other traffic blocked
kubectl run test-client -n test-isolation --image=busybox --rm -it -- wget -O- --timeout=2 test-app
# => wget: download timed out  (blocked as expected by deny-all policy)
 
# Step 5: Add allow rule
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
 name: allow-to-nginx
 namespace: test-isolation
spec:
 podSelector:
 matchLabels:
 run: test-app
 ingress:
 - from:
 - podSelector: {}
EOF
# => Allow rule added
# => Policy created: allows all Pods in namespace to reach test-app
 
# Step 6: Retry connection
kubectl run test-client -n test-isolation --image=busybox --rm -it -- wget -O- test-app
# => SUCCESS (now allowed)

Key Takeaway: Apply default deny NetworkPolicies at namespace level for zero-trust security; always allow DNS egress to prevent breaking service discovery; create explicit allow rules for required traffic only; test thoroughly in non-production namespace before production rollout.

Why It Matters: Default deny NetworkPolicies implement zero-trust architecture principles, shifting security from perimeter-based (firewall at edge) to identity-based (verify every connection). This approach reduces insider threat risks and contains breach impact by limiting lateral movement. Implementing zero-trust at the network level is now a SOC 2 and ISO 27001 requirement for cloud-native organizations.

Example 66: Namespace Isolation with NetworkPolicy

NetworkPolicies can isolate namespaces, allowing traffic only from specific namespaces using namespaceSelector. This enforces environment separation (dev, staging, prod).

%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
%% Namespace isolation
graph TD
 A[Namespace: backend-ns] --> B{NetworkPolicy}
 B --> C[Allow from<br/>frontend-ns]
 B --> D[Deny from<br/>dev-ns]
 C --> E[frontend Pod<br/>environment=frontend]
 D --> F[dev Pod<br/>environment=dev]
 E --> G[Access granted]
 F --> H[Access denied]
 
 style A fill:#0173B2,stroke:#000000,stroke-width:2px,color:#fff
 style C fill:#029E73,stroke:#000000,stroke-width:2px,color:#fff
 style D fill:#CC78BC,stroke:#000000,stroke-width:2px,color:#fff
 style G fill:#029E73,stroke:#000000,stroke-width:2px,color:#fff
 style H fill:#CC78BC,stroke:#000000,stroke-width:2px,color:#fff

apiVersion: networking.k8s.io/v1 # => NetworkPolicy API (stable)
kind: NetworkPolicy # => Cross-namespace traffic control policy
metadata:
 name: allow-from-frontend-namespace # => Policy name (descriptive: source-ns → target-ns)
 namespace: backend-ns # => Policy lives in backend-ns (restricts traffic TO backend-ns Pods)
spec:
 podSelector:
 matchLabels:
 app: api # => Applies to api Pods in backend-ns (target Pods)
 # => Empty {} would apply to ALL Pods in namespace
 policyTypes:
 - Ingress # => Controls incoming traffic to selected Pods
 # => Only Ingress: egress from api Pods not restricted
 ingress:
 - from:
 # => namespaceSelector AND podSelector = AND relationship
 # => Both conditions must be true simultaneously
 - namespaceSelector:
 matchLabels:
 environment:
 frontend # => Allow from namespaces labeled environment=frontend
 # => kubectl label namespace frontend-ns environment=frontend
 # => Namespace must have this label BEFORE policy applies
 - podSelector:
 matchLabels:
 app: web # => AND only Pods labeled app=web in that namespace
 # => Both conditions required: correct namespace AND correct Pod label
 ports:
 - protocol: TCP # => TCP protocol
 port: 8080 # => Only port 8080 allowed (application API port)
 # => Other ports remain blocked
 
# Namespace isolation patterns:
# => Production namespace: allow only from prod namespaces
kubectl label namespace production environment=production  # => Label namespace for selector
kubectl label namespace staging environment=staging        # => Different label = isolated
kubectl label namespace development environment=dev        # => Dev isolated from prod
 
# Label namespaces for NetworkPolicy:
# => kubectl label namespace frontend-ns environment=frontend
# => kubectl get namespace frontend-ns --show-labels       # => Verify label applied

Key Takeaway: Use namespaceSelector for namespace-level isolation; label namespaces to define trust boundaries; combine podSelector and namespaceSelector for fine-grained cross-namespace access control.

Why It Matters: Namespace isolation prevents cross-environment contamination where development workloads access production databases — a leading cause of data leaks in clusters without proper segmentation. This pattern enables multi-tenancy where different teams share a cluster while maintaining security boundaries between regulated financial workloads and general engineering services. Namespace-based NetworkPolicies also enforce compliance segmentation (separating PCI workloads from non-PCI, HIPAA from non-HIPAA) required by auditors, reducing the scope of compliance certifications and associated costs.

Example 67: Egress to External Services

NetworkPolicies can control egress traffic to external IP addresses and DNS names, restricting outbound connections to approved services.

apiVersion:
 networking.k8s.io/v1 # => NetworkPolicy API
 # => Stable networking API
kind:
 NetworkPolicy # => NetworkPolicy resource
 # => Egress traffic control
metadata:
 # => Policy metadata
 name:
 allow-egress-external # => Policy name
 # => Descriptive identifier
 namespace:
 default # => Namespace scope
 # => Applies within this namespace
spec:
 # => NetworkPolicy specification
 podSelector:
 # => Selects target Pods
 matchLabels:
 # => Label selector
 app:
 api # => Applies to app=api Pods
 # => Restricts these Pods' egress
 policyTypes:
 # => Policy directions
 - Egress # => Controls outgoing traffic only
 # => Allows all ingress by default
 # => Egress default deny when specified
 egress:
 # => Egress rule list
 # Allow DNS resolution
 - to:
 # => Destination selector
 - namespaceSelector:
 # => Namespace label selector
 matchLabels:
 # => Namespace labels
 kubernetes.io/metadata.name:
 kube-system # => kube-system namespace
 # => DNS service location
 podSelector:
 # => Pod label selector
 matchLabels:
 # => Pod labels
 k8s-app:
 kube-dns # => Allow CoreDNS
 # => DNS service Pods
 # => Required for name resolution
 ports:
 # => Allowed ports
 - protocol:
 UDP # => UDP protocol
 # => DNS uses UDP
 port:
 53 # => DNS port
 # => Standard DNS port
 # => Essential for service discovery
 
 # Allow specific external IP
 - to:
 # => Destination IP range
 - ipBlock:
 # => IP block selector
 cidr:
 203.0.113.0/24 # => Allow to IP range
 # => /24 = 256 addresses
 # => Specific external service range
 except:
 # => Exclusion list
 - 203.0.113.10/32 # => Except this IP
 # => /32 = single IP
 # => Blocked within allowed range
 ports:
 # => Allowed ports
 - protocol:
 TCP # => TCP protocol
 # => HTTPS uses TCP
 port:
 443 # => HTTPS only
 # => Secure connection required
 # => Blocks unencrypted HTTP
 
 # Allow to external service
 - to:
 # => Destination IP range
 - ipBlock:
 # => IP block selector
 cidr:
 0.0.0.0/0 # => Allow to any IP (internet)
 # => All external IPs
 # => Broad internet access
 ports:
 # => Allowed ports
 - protocol:
 TCP # => TCP protocol
 # => HTTPS uses TCP
 port:
 443 # => HTTPS to internet
 # => Secure connections only
 # => No HTTP (port 80)
 
# Egress control use cases:
# => Restrict outbound to approved APIs

Key Takeaway: Control egress traffic to external services using ipBlock; always allow DNS (port 53) for name resolution; use CIDR ranges to restrict outbound access to approved IP addresses and services.

Why It Matters: Egress policies prevent data exfiltration attacks where compromised Pods send secrets to attacker-controlled servers — a key tactic in supply chain attacks. Strict egress whitelisting is a financial regulatory requirement (PCI-DSS requires documented network segmentation) and reduces incident response costs by limiting blast radius. Egress controls block ransomware command-and-control traffic by allowing Pods to communicate only with explicitly approved external endpoints.

Example 68: NetworkPolicy with Multiple Selectors

NetworkPolicies support complex rules combining podSelector, namespaceSelector, and ipBlock for fine-grained traffic control.

apiVersion:
 networking.k8s.io/v1 # => Networking API group
 # => Stable v1 NetworkPolicy API
 # => NetworkPolicies since Kubernetes 1.7
kind:
 NetworkPolicy # => NetworkPolicy resource
 # => Defines traffic rules for Pods
 # => Firewall rules at Pod level
metadata:
 name:
 complex-policy # => Policy name
 # => Unique within namespace
 namespace:
 production # => Namespace scope
 # => Policy applies only to production namespace
spec:
 # => Policy specification
 podSelector:
 # => Selects Pods this policy applies to
 matchLabels:
 # => Label-based selection (AND logic)
 tier:
 backend # => Matches backend tier Pods
 # => First label requirement
 environment:
 prod # => Applies to Pods with both labels
 # => Second label requirement
 # => Both tier=backend AND environment=prod
 policyTypes:
 # => Traffic directions controlled
 - Ingress # => Controls incoming traffic
 # => Defines what can connect TO these Pods
 - Egress # => Controls outgoing traffic
 # => Defines where these Pods can connect
 ingress:
 # => Ingress rules (OR logic between rules)
 # Rule 1: Allow from frontend in same namespace
 - from:
 # => Traffic sources (OR logic between items)
 - podSelector:
 # => Pods in same namespace
 matchLabels:
 tier:
 frontend # => Allow from frontend Pods
 # => Same namespace as policy
 ports:
 # => Allowed ports (AND logic with from)
 - protocol:
 TCP # => TCP protocol
 # => Alternative: UDP, SCTP
 port:
 8080 # => Application port
 # => Backend API endpoint
 
 # Rule 2: Allow from monitoring namespace
 - from:
 # => Different source selector
 - namespaceSelector:
 # => Pods from specific namespace
 matchLabels:
 name:
 monitoring # => Monitoring namespace
 # => Cross-namespace access
 ports:
 # => Monitoring port
 - protocol:
 TCP # => TCP protocol
 # => HTTP/gRPC common
 port:
 9090 # => Metrics endpoint
 # => Prometheus scrape port
 # => Separate rule for monitoring
 
 # Rule 3: Allow from specific IP range
 - from:
 # => IP-based source
 - ipBlock:
 # => CIDR-based access control
 cidr:
 10.0.0.0/8 # => Internal network
 # => RFC 1918 private network
 # => Corporate network range
 ports:
 # => Application port
 - protocol:
 TCP # => TCP protocol
 # => Standard application traffic
 port:
 8080 # => Backend API port
 # => Same port as Rule 1
 
 egress:
 # => Egress rules (OR logic)
 # Allow to database
 - to:
 # => Egress destinations
 - podSelector:
 # => Database Pods in same namespace
 matchLabels:
 tier:
 database # => Database tier Pods
 # => Backend can connect to database
 ports:
 # => Database port
 - protocol:
 TCP # => TCP protocol
 # => PostgreSQL wire protocol
 port:
 5432 # => PostgreSQL port
 # => Standard PostgreSQL port
 
 # Allow to external API
 - to:
 # => External destination
 - ipBlock:
 # => Specific external IP
 cidr:
 203.0.113.50/32 # => Specific external API
 # => /32 = single IP
 # => Whitelisted external service
 ports:
 # => HTTPS port
 - protocol:
 TCP # => TCP protocol
 # => HTTPS over TCP
 port:
 443 # => HTTPS port
 # => Encrypted external API
 
# Multiple selector behavior:
# => Multiple items in from/to are OR'd (any match allows)

Key Takeaway: Combine multiple selectors for complex traffic rules; understand OR (multiple from/to items) vs AND (multiple selectors in same item) semantics; test policies thoroughly in non-production before applying to production.

Why It Matters: Micro-segmentation with complex NetworkPolicies enables zero-trust architecture at scale, isolating payment processing, trading systems, and customer data to meet regulatory requirements (PCI-DSS, SOX, GDPR). This pattern reduces breach impact radius compared to flat networks — lateral movement exploits that compromise entire clusters are prevented when workloads can only communicate through explicitly allowed paths. ---

Custom Resources & Operators (Examples 69-73)

Example 69: Custom Resource Definition (CRD)

CustomResourceDefinitions extend Kubernetes API with custom resource types. CRDs define schema, validation, and versions for custom resources.

%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
%% CRD and custom resource relationship
graph TD
 A[CRD: databases.example.com] --> B[Defines schema]
 B --> C[Custom Resource: Database]
 C --> D[Instance: production-db]
 D --> E[spec.engine: postgres<br/>spec.version: 15<br/>spec.replicas: 3]
 A --> F[Validation rules]
 F --> G[Accepted by API server]
 
 style A fill:#0173B2,stroke:#000000,stroke-width:2px,color:#fff
 style C fill:#DE8F05,stroke:#000000,stroke-width:2px,color:#fff
 style D fill:#029E73,stroke:#000000,stroke-width:2px,color:#fff

# CustomResourceDefinition: Extend Kubernetes API with custom types
apiVersion: apiextensions.k8s.io/v1 # => CRD API version
kind: CustomResourceDefinition # => Defines new custom resource type
metadata: # => CRD metadata (cluster-scoped, no namespace)
 name:
 databases.example.com # => CRD name format: <plural>.<group>
 # => Must match spec.names.plural + spec.group
 # => Cluster-scoped (no namespace)
spec: # => CRD specification
 group:
 example.com # => API group for custom resources
 # => Appears in apiVersion: example.com/v1
 # => Use reverse domain (com.example, io.mydomain)
 # => Avoid k8s.io, kubernetes.io (reserved)
 
 names: # => Resource naming (used in kubectl and API)
 kind: # => Singular capitalized resource name
 Database # => Resource kind (singular, PascalCase)
 # => Used in YAML: kind: Database
 # => Convention: singular, capitalized
 plural: # => Plural resource name
 databases # => Plural form (lowercase)
 # => Used in API path: /apis/example.com/v1/databases
 # => Used in kubectl: kubectl get databases
 singular: # => Singular resource name (used in kubectl get)
 database # => Singular form (lowercase)
 # => Used in kubectl: kubectl get database
 shortNames: # => Short aliases for kubectl
 - db # => kubectl get db (shorthand)
 # => Easier to type than kubectl get databases
 listKind: # => List type name (auto-generated if omitted)
 DatabaseList # => List kind (optional, auto-generated)
 # => Used for list operations
 categories: # => Custom categories for grouping
 - all # => Included in kubectl get all
 
 scope: # => Whether resources are namespaced or cluster-scoped
 Namespaced # => Resource scope
 # => Namespaced: resources belong to namespaces
 # => Cluster: cluster-wide resources (like nodes)
 # => Most CRDs are Namespaced
 
 versions: # => API versions (can have multiple for compatibility)
 - name: # => Version name field (e.g., v1, v1beta1, v1alpha1)
 v1 # => Version identifier
 # => Appears in apiVersion: example.com/v1
 # => Convention: v1, v1alpha1, v1beta1, v2
 served: # => Whether this version is served by API server
 true # => Enable this version in API
 # => false = version exists but disabled
 # => Can serve multiple versions simultaneously
 storage: # => Designated storage version (only one allowed)
 true # => Storage version (etcd persistence)
 # => MUST have exactly ONE storage version
 # => Other versions converted to storage version
 # => Migration changes storage version
 
 schema: # => OpenAPI v3 validation schema
 openAPIV3Schema: # => JSON Schema for validation
 type: object # => Root must be object
 required: ["spec"] # => Top-level required fields
 properties: # => Field definitions (spec and status)
 spec: # => Desired state (user-defined)
 type: object # => Spec is always object
 required: # => Required fields in spec
 ["engine", "version"] # => Required spec fields
 # => API rejects resources missing these
 properties: # => Spec field definitions
 engine: # => Database engine field
 type: string # => Field type
 enum:
 ["postgres", "mysql", "mongodb"] # => Allowed values
 # => Validation rejects invalid values
 description: "Database engine type" # => Field documentation
 version: # => Database version field
 type:
 string # => Version as string (not int)
 # => Supports semver: "15", "15.2", "15.2.1"
 pattern:
 '^[0-9]+(\.[0-9]+)*$' # => Regex validation
 # => Ensures version format
 replicas: # => Replica count field
 type: integer # => Integer type
 minimum: 1 # => Validation: must be >= 1
 maximum: 10 # => Validation: must be <= 10
 default: 3 # => Default value if not specified
 storage: # => Storage configuration (optional object field)
 type: object # => Nested object type
 properties:
 size:
 type: string # => Storage size as string
 pattern: "^[0-9]+(Gi|Mi)$" # => Examples: 10Gi, 500Mi (regex validated)
 storageClass:
 type: string # => Kubernetes StorageClass name for PV provisioning
 
 status: # => Actual state (controller-managed, NOT user-editable)
 type: object # => Status is always object type
 properties: # => Status field definitions
 phase: # => Current lifecycle phase
 type: string # => String field
 enum: ["Pending", "Running", "Failed"] # => Valid phases (validated by API server)
 readyReplicas: # => Number of ready replicas
 type: integer # => Integer count
 conditions: # => Standard Kubernetes condition array (follows convention)
 type: array # => Array of condition objects
 items: # => Each item in the array
 type: object # => Each condition is an object
 properties: # => Condition object fields
 type: # => Condition type field
 type: string # => Condition type (e.g., "Ready", "Degraded")
 status: # => Condition status field
 type: string # => Condition state string
 enum: ["True", "False", "Unknown"] # => Standard Kubernetes values
 lastTransitionTime: # => Timestamp field
 type: string # => Timestamp as string
 format: date-time # => RFC3339 timestamp format (validated)
 reason: # => Machine-readable reason field
 type: string # => Machine-readable reason code (e.g., "Pulling")
 message: # => Human-readable message field
 type: string # => Human-readable explanation of condition
 
 # Subresources (optional but recommended)
 subresources:
 status: {} # => Enable status subresource
 # => Separate /status endpoint
 scale: # => Enable scale subresource
 specReplicasPath: .spec.replicas # => Path to replica count in spec
 statusReplicasPath: .status.readyReplicas # => Path in status
 labelSelectorPath: .status.labelSelector # => Label selector path
 # => Enables: kubectl scale database/production-db --replicas=5
 
 # Additional printer columns (kubectl get output)
 additionalPrinterColumns:
 - name: Engine # => Column name (shown in kubectl get output header)
 type: string # => Data type (string, integer, date, boolean)
 jsonPath: .spec.engine # => JSONPath to field in resource
 description: "Database engine type" # => Column description (in kubectl explain)
 - name: Version
 type: string # => Version as string
 jsonPath: .spec.version # => Path to spec.version field
 - name: Replicas
 type: integer # => Numeric column
 jsonPath: .spec.replicas # => Desired replica count
 - name: Ready
 type: integer # => Count of ready replicas
 jsonPath: .status.readyReplicas # => Status field (controller-populated)
 - name: Phase
 type: string # => Current lifecycle phase
 jsonPath: .status.phase # => Status phase field
 - name: Age
 type: date # => Date type (auto-formatted as "5d" or "2h")
 jsonPath: .metadata.creationTimestamp # => Standard creation time field
 # => Result: kubectl get databases shows Engine/Version/Replicas/Ready/Phase/Age columns
 
---
# => Document separator: custom resource instance follows (created by operator)
# Custom Resource instance (using CRD above)
apiVersion: example.com/v1 # => Uses CRD API group and version
kind: Database # => Uses CRD kind (defined by CRD above)
metadata:
 name: production-db # => Resource name (unique within namespace)
 namespace: default # => Namespace (if scope: Namespaced)
spec:
 engine: postgres # => Validated against enum: [postgres, mysql, mongodb]
 version: "15" # => String version matching pattern: ^[0-9]+(\.[0-9]+)*$
 replicas: 3 # => Integer within min:1 and max:10 (default:3)
 storage:
 size: 50Gi # => Matches pattern: ^[0-9]+(Gi|Mi)$
 storageClass: fast-ssd # => StorageClass name for provisioning
 
# Initially, status is empty (controller populates)
# status:
# phase: Pending
# readyReplicas: 0
# conditions: []

CRD lifecycle operations:

# Create CRD (defines new resource type)
kubectl apply -f database-crd.yaml
# => customresourcedefinition.apiextensions.k8s.io/databases.example.com created
 
# Verify CRD created
kubectl get crd databases.example.com
# => NAME                          CREATED AT
# => databases.example.com         2026-03-07T05:00:00Z
 
# Check CRD details
kubectl describe crd databases.example.com
# => Shows: group, versions, scope, schema, conversion, printer columns
 
# Create custom resource instance
kubectl apply -f production-db.yaml
# => database.example.com/production-db created
 
# List custom resources
kubectl get databases
# => NAME ENGINE VERSION REPLICAS READY PHASE
 
# Get specific resource
kubectl get database production-db -o yaml
# => Shows full YAML with spec and status
 
# Use short name
kubectl get db production-db
# => Same as kubectl get database
 
# Describe resource
kubectl describe database production-db
# => Human-readable details including events
 
# Update resource (change replicas)
kubectl patch database production-db --type=merge -p '{"spec":{"replicas":5}}'
# => database.example.com/production-db patched
 
# Update status (requires status subresource)
kubectl patch database production-db --subresource=status --type=merge -p '{"status":{"phase":"Running","readyReplicas":5}}'
# => Updates status independently from spec
 
# Scale resource (requires scale subresource)
kubectl scale database production-db --replicas=7
# => database.example.com/production-db scaled
 
# Delete custom resource
kubectl delete database production-db
# => database.example.com "production-db" deleted
 
# Delete CRD (removes ALL instances!)
kubectl delete crd databases.example.com
# => customresourcedefinition.apiextensions.k8s.io "databases.example.com" deleted

CRD validation examples:

# Valid resource (passes validation)
kubectl apply -f - <<EOF
apiVersion: example.com/v1 # => Uses CRD API group + version
kind: Database # => CRD-defined resource kind
metadata:
 name: valid-db # => Resource instance name
spec:
 engine: postgres # => Valid: matches enum [postgres, mysql, mongodb]
 version: "15.2" # => Valid: matches pattern ^[0-9]+(\.[0-9]+)*$
 replicas: 3 # => Valid: within min:1 max:10 range
EOF
# => database.example.com/valid-db created
 
# Invalid: missing required field
kubectl apply -f - <<EOF
apiVersion: example.com/v1 # => Uses CRD API group + version
kind: Database # => CRD-defined resource kind
metadata:
 name: invalid-db # => Resource name
spec:
 engine: postgres # => Valid engine value
 # version missing!
 replicas: 3 # => Valid but version field missing
EOF
# => Error: spec.version is required
 
# Invalid: enum violation
kubectl apply -f - <<EOF
apiVersion: example.com/v1 # => Uses CRD API group + version
kind: Database # => CRD-defined resource kind
metadata:
 name: invalid-db # => Resource name
spec:
 engine: oracle # Not in enum! # => INVALID: oracle not in [postgres, mysql, mongodb]
 version: "19" # => Valid version format
 replicas: 3 # => Valid replica count
EOF
# => Error: spec.engine must be one of: postgres, mysql, mongodb
 
# Invalid: out of range
kubectl apply -f - <<EOF
apiVersion: example.com/v1 # => Uses CRD API group + version
kind: Database # => CRD-defined resource kind
metadata:
 name: invalid-db # => Resource name
spec:
 engine: postgres # => Valid engine value
 version: "15" # => Valid version format
 replicas: 100 # Exceeds maximum: 10 # => INVALID: maximum is 10
EOF
# => Error: spec.replicas must be <= 10

Key Takeaway: Use CRDs to extend Kubernetes with domain-specific resources; define comprehensive OpenAPI schema with validation rules (required, enum, min/max, patterns); enable subresources (status, scale) for standard Kubernetes patterns; add printer columns for better kubectl output; implement controllers (operators) to reconcile custom resources to desired state.

Why It Matters: CRDs power the operator pattern used by CloudNativePG, Prometheus Operator, and ArgoCD to manage complex stateful applications declaratively. CRDs enable platform teams to expose higher-level abstractions (Database, Certificate, Application) instead of low-level Kubernetes primitives, improving developer productivity while enforcing organizational standards (backups, monitoring, security) automatically. The Kubernetes ecosystem has 200+ production-ready operators built on CRDs, covering databases, certificates, service meshes, and more.

Example 70: Custom Resource with Subresources

CRDs support subresources like status and scale, enabling kubectl commands and standard Kubernetes patterns.

apiVersion:
 apiextensions.k8s.io/v1 # => CRD API version
 # => Stable v1 API
kind:
 CustomResourceDefinition # => Defines custom resource type
 # => Extends Kubernetes API
metadata:
 # => CRD metadata
 name:
 applications.example.com # => CRD name format: <plural>.<group>
 # => Must match spec.names.plural + spec.group
spec:
 # => CRD specification
 group:
 example.com # => API group for custom resources
 # => Appears in apiVersion: example.com/v1
 names:
 # => Resource naming
 kind:
 Application # => Resource kind (singular, PascalCase)
 # => Used in YAML: kind: Application
 plural:
 applications # => Plural form (lowercase)
 # => Used in API path and kubectl
 scope:
 Namespaced # => Resource scope
 # => Resources belong to namespaces
 versions:
 # => API versions
 - name:
 v1 # => Version identifier
 # => Appears in apiVersion: example.com/v1
 served:
 true # => Enable this version in API
 # => Version is active
 storage:
 true # => Storage version (etcd persistence)
 # => MUST have exactly ONE storage version
 schema:
 # => OpenAPI v3 validation schema
 openAPIV3Schema:
 # => JSON Schema for validation
 type:
 object # => Root must be object
 # => Top-level structure
 properties:
 # => Field definitions
 spec:
 # => Desired state (user-defined)
 type:
 object # => Spec is always object
 # => Contains user configuration
 properties:
 # => Spec fields
 image:
 # => Container image field
 type:
 string # => Field type
 # => Image tag or digest
 replicas:
 # => Replica count field
 type:
 integer # => Integer type
 # => Number of desired instances
 status:
 # => Actual state (controller-managed)
 type:
 object # => Status is always object
 # => Current state of resource
 properties:
 # => Status fields
 availableReplicas:
 # => Number of ready replicas
 type:
 integer # => Integer type
 # => Actual running instances
 conditions:
 # => Status conditions array
 type:
 array # => Array of condition objects
 # => Standard Kubernetes pattern
 
 subresources:
 # => Subresources enable kubectl integration
 status: {} # => Enable status subresource
 # => Status updates separate from spec
 scale: # => Enable scale subresource
 # => Enables kubectl scale and HPA integration
 specReplicasPath:
 .spec.replicas # => Path to replica count in spec
 # => JSONPath to desired replicas
 statusReplicasPath:
 # => Path to replica count in status
 .status.availableReplicas # => JSONPath to current replicas
 # => kubectl scale application/myapp --replicas=5
 # => HorizontalPodAutoscaler integration
 
# Subresource benefits:
# => status: separate permissions (controller updates, users read-only)

Key Takeaway: Enable status subresource for separation of spec and status updates; enable scale subresource for kubectl scale integration; subresources follow standard Kubernetes patterns improving UX.

Why It Matters: Status subresources implement Kubernetes' level-triggered reconciliation pattern, critical for building reliable operators. This separation prevents race conditions where user spec changes conflict with controller status updates (a common bug in early Kubernetes operators). Companies like Red Hat (OpenShift Operators) and VMware (Tanzu) standardized on status subresources for all custom controllers, improving operator stability and enabling GitOps workflows where git only tracks spec (desired state) and controllers independently update status. The scale subresource enables HorizontalPodAutoscaler to manage custom resources, allowing auto-scaling of databases and stateful applications just like Deployments.

Example 71: Operator Pattern with Controller

Operators are custom controllers watching custom resources and reconciling actual state to desired state. This example shows a basic operator structure.

%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
%% Operator reconciliation loop
graph TD
 A[Watch Database CR] --> B{Event?}
 B -->|Create/Update| C[Read desired state<br/>spec.replicas=3]
 C --> D[Read actual state<br/>StatefulSet replicas=2]
 D --> E{Reconcile needed?}
 E -->|Yes| F[Update StatefulSet<br/>replicas=3]
 E -->|No| G[Update status]
 F --> G
 G --> A
 
 style A fill:#0173B2,stroke:#000000,stroke-width:2px,color:#fff
 style C fill:#DE8F05,stroke:#000000,stroke-width:2px,color:#fff
 style D fill:#CA9161,stroke:#000000,stroke-width:2px,color:#fff
 style F fill:#029E73,stroke:#000000,stroke-width:2px,color:#fff

# Watch all custom resources of type Database
kubectl get databases -w                                  # => Streams Database resource changes
                                                          # => Shows NAME, ENGINE, VERSION, REPLICAS, READY, PHASE columns
 
# View operator logs (reconciliation loop output)
kubectl logs -n operators deploy/database-operator -f    # => Follow operator log stream
                                                          # => Logs show reconciliation events and state diffs
 
# Trigger reconciliation by changing spec (operator acts automatically)
kubectl patch database production-db --type=merge -p '{"spec":{"replicas":5}}'
# => database.example.com/production-db patched
# => Operator detects change via watch → reconcile loop runs
 
# Check StatefulSet created by operator
kubectl get statefulset -l managed-by=database-operator  # => Lists operator-managed StatefulSets
                                                          # => Shows production-db with READY and AGE columns
 
# View operator-managed status (updated by operator)
kubectl get database production-db -o jsonpath='{.status}' # => Shows operator-populated status
# => {"phase":"Running","readyReplicas":5}
 
# Scaffold new operator with Operator SDK
operator-sdk init --domain example.com --repo github.com/example/database-operator
# => Creates operator project structure: main.go, go.mod, Makefile, config/
 
# Generate CRD scaffold
operator-sdk create api --group databases --version v1 --kind Database --resource --controller
# => Generates api/v1/database_types.go and controllers/database_controller.go
 
# Build and push operator image
make docker-build docker-push IMG=myregistry/database-operator:v1.0.0
# => Builds operator Docker image and pushes to registry
 
# Deploy operator to cluster
kubectl apply -f deploy/                                 # => Apply all operator manifests
# => serviceaccount/database-operator created

Key Takeaway: Operators automate operational tasks by watching custom resources and reconciling state; use operator frameworks (Operator SDK, Kubebuilder) for production operators; operators enable self-service platforms and complex application lifecycle management.

Why It Matters: The operator pattern encodes operational expertise in code — instead of runbooks and manual procedures, operators continuously enforce best practices (backups, monitoring, upgrades) automatically. Tools like CloudNativePG and the Prometheus Operator demonstrate this pattern managing complex stateful workloads at scale. The CNCF Operator Framework has enabled platform teams to build internal developer platforms where developers self-serve databases, message queues, and caches through simple YAML, reducing provisioning time from weeks to minutes.

Example 72: Operator Lifecycle Manager (OLM)

Operator Lifecycle Manager manages operator installation, upgrades, and dependencies. OLM provides operator catalog, versioning, and RBAC.

# Install OLM:
# => curl -sL https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.25.0/install.sh | bash -s v0.25.0
 
# Operator CSV (ClusterServiceVersion)
apiVersion:
 operators.coreos.com/v1alpha1 # => OLM API group
 # => Operator Lifecycle Manager
 # => Manages operator installation and upgrades
kind:
 ClusterServiceVersion # => CSV resource type
 # => Operator metadata and installation spec
 # => OLM uses CSV for operator management
metadata:
 name:
 database-operator.v1.0.0 # => CSV name with version
 # => Format: name.vX.Y.Z
 # => Unique identifier for this version
 namespace:
 operators # => Namespace for operator
 # => OLM operators namespace
 # => All operators installed here
spec:
 # => CSV specification
 displayName:
 Database Operator # => Human-readable operator name
 # => Shown in OperatorHub UI
 version:
 1.0.0 # => Semantic version
 # => Used for upgrade path
 # => OLM tracks version progression
 description:
 Manages PostgreSQL databases # => Operator description
 # => Shown in catalog
 # => Explains operator purpose
 keywords:
 # => Search keywords for OperatorHub
 - database # => First keyword
 # => Helps users find operator
 - postgres # => Second keyword
 # => Database type
 maintainers:
 # => Operator maintainer list
 - name:
 Platform Team # => Maintainer name
 # => Team or individual
 email:
 [email protected] # => Contact email
 # => Support contact
 provider:
 # => Organization providing operator
 name:
 Example Corp # => Provider name
 # => Company or project
 icon:
 # => Operator icon for UI
 - base64data:
 <base64-encoded-icon> # => Base64-encoded icon
 # => PNG or SVG image
 mediatype:
 image/png # => Icon format
 # => MIME type
 customresourcedefinitions:
 # => CRDs owned by operator
 owned:
 # => CRDs this operator creates
 - name:
 databases.example.com # => CRD full name
 # => API group + plural
 version:
 v1 # => CRD version
 # => API version
 kind:
 Database # => Resource kind
 # => Singular name
 displayName:
 Database # => UI display name
 # => Human-readable kind
 description:
 PostgreSQL Database instance # => CRD description
 # => Resource purpose
 install:
 # => Installation strategy
 strategy:
 deployment # => Deployment-based install
 # => OLM creates Deployment for operator
 spec:
 # => Installation specification
 deployments:
 # => Deployment definitions
 - name:
 database-operator # => Operator Deployment name
 # => Created by OLM
 spec:
 # => Deployment spec
 replicas:
 1 # => Single operator instance
 # => One controller per cluster
 selector:
 # => Pod selector
 matchLabels:
 name:
 database-operator # => Selector label
 # => Matches template labels
 template:
 # => Pod template
 metadata:
 labels:
 # => Pod labels
 name:
 database-operator # => Pod label
 # => Matches selector
 spec:
 # => Pod specification
 serviceAccountName:
 database-operator # => ServiceAccount for operator
 # => OLM creates automatically
 # => RBAC permissions
 containers:
 # => Container list
 - name:
 operator # => Operator container
 # => Controller implementation
 image:
 example.com/database-operator:v1.0.0 # => Operator image
 # => Controller binary
 # => Version-specific image
 
# Install operator via OLM:
# => kubectl create -f database-operator-csv.yaml
 
# Upgrade operator:
# => Create new CSV with version v1.1.0

Key Takeaway: Use OLM for production operator management; OLM handles installation, upgrades, RBAC, and dependencies automatically; publish operators to OperatorHub for community distribution.

Why It Matters: OLM solves operator dependency hell and upgrade risks that plagued early Kubernetes operators (incompatible CRD versions, RBAC conflicts, manual installation steps). Companies like Red Hat (OpenShift) and SUSE (Rancher) standardized on OLM for distributing 200+ certified operators, reducing installation failures by 95% through automated dependency resolution and validation. OLM's channel-based updates enable controlled rollout strategies (test in alpha channel, promote to stable after validation) critical for production stability - the Postgres Operator community uses OLM to deliver monthly updates to 5000+ installations without manual intervention, improving security patch deployment time from weeks to hours.

Example 73: Admission Webhooks

Admission webhooks intercept API requests before persistence, enabling validation and mutation. Use webhooks for custom policies and automatic resource modification.

%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
%% Admission webhook flow
graph TD
 A[kubectl create pod] --> B[API Server]
 B --> C[Authentication]
 C --> D[Authorization]
 D --> E{Mutating Webhooks}
 E --> F[Inject sidecar]
 F --> G{Validating Webhooks}
 G -->|Valid| H[Persist to etcd]
 G -->|Invalid| I[Reject: 403]
 
 style B fill:#0173B2,stroke:#000000,stroke-width:2px,color:#fff
 style E fill:#DE8F05,stroke:#000000,stroke-width:2px,color:#fff
 style G fill:#CA9161,stroke:#000000,stroke-width:2px,color:#fff
 style H fill:#029E73,stroke:#000000,stroke-width:2px,color:#fff
 style I fill:#CC78BC,stroke:#000000,stroke-width:2px,color:#fff

apiVersion:
 admissionregistration.k8s.io/v1 # => Admission registration API group
 # => Stable v1 API for webhooks
 # => Admission webhooks since Kubernetes 1.16
kind:
 ValidatingWebhookConfiguration # => Validating webhook resource
 # => Intercepts API requests for validation
 # => Can approve or reject requests
metadata:
 name:
 pod-validator # => Webhook configuration name
 # => Unique cluster-wide identifier
spec:
 # => Webhook configuration
webhooks:
 # => List of webhook endpoints
 - name:
 pod-validator.example.com # => Webhook name (must be FQDN)
 # => Unique webhook identifier
 # => DNS-style naming convention
 clientConfig:
 # => How API server contacts webhook
 service:
 # => Service-based endpoint (in-cluster)
 name:
 webhook-service # => Service name
 # => Routes to webhook server Pods
 namespace:
 default # => Service namespace
 # => Where webhook service runs
 path:
 /validate # => HTTP path for validation
 # => Webhook server endpoint
 caBundle:
 <base64-ca-cert> # => CA certificate for TLS
 # => Validates webhook server certificate
 # => Required for HTTPS
 # => Base64-encoded CA cert
 rules:
 # => When to invoke webhook
 - operations:
 ["CREATE", "UPDATE"] # => Trigger on CREATE and UPDATE
 # => Validates new and modified Pods
 # => DELETE and CONNECT also possible
 apiGroups:
 [""] # => Core API group (v1)
 # => Empty string = core resources
 apiVersions:
 ["v1"] # => API version
 # => Matches v1 Pods
 resources:
 ["pods"] # => Intercept Pod create/update
 # => Webhook validates Pod requests
 # => Can specify subresources: pods/status
 admissionReviewVersions:
 ["v1"] # => AdmissionReview API version
 # => Webhook must support v1
 # => Sent in webhook request/response
 sideEffects:
 None # => Webhook has no side effects
 # => Idempotent validation only
 # => Required: None, NoneOnDryRun
 timeoutSeconds:
 5 # => Webhook timeout: 5 seconds
 # => API server waits max 5s
 # => Failure after timeout
 # => Default: 10 seconds
 
 
# Webhook server response:
# {
# "apiVersion": "admission.k8s.io/v1",
# => AdmissionReview API version
# "kind": "AdmissionReview",
# => Response type
# "response": {
# => Validation result
# "uid": "<request-uid>",
# => Matches request UID
# "allowed": true/false, # => Allow or reject
# => true: admit request
# "status": {
# => Rejection details (if allowed=false)
# "message": "Rejection reason"
# => Human-readable error message
# }
# }
# }
 
---
apiVersion:
 admissionregistration.k8s.io/v1 # => Admission registration API
 # => Same API as ValidatingWebhook
kind:
 MutatingWebhookConfiguration # => Mutating webhook resource
 # => Modifies requests before admission
 # => Runs BEFORE validating webhooks
metadata:
 name:
 pod-mutator # => Webhook configuration name
 # => Unique cluster-wide
webhooks:
 # => Webhook endpoint list
 - name:
 pod-mutator.example.com # => Webhook FQDN
 # => Unique identifier
 clientConfig:
 # => Webhook endpoint configuration
 service:
 # => In-cluster Service endpoint
 name:
 webhook-service # => Service name
 # => Same service as validator
 namespace:
 default # => Service namespace
 # => Where webhook runs
 path:
 /mutate # => HTTP path for mutation
 # => Different endpoint than validator
 caBundle:
 <base64-ca-cert> # => CA certificate for TLS
 # => Validates server identity
 rules:
 # => When to invoke webhook
 - operations:
 ["CREATE"] # => Only on CREATE
 # => Mutates new Pods only
 # => No UPDATE mutations
 apiGroups:
 [""] # => Core API group
 # => v1 resources
 apiVersions:
 ["v1"] # => API version v1
 # => Matches Pod API version
 resources:
 ["pods"] # => Mutates Pods
 # => Can modify Pod spec before admission
 admissionReviewVersions:
 ["v1"] # => AdmissionReview v1 API
 # => Webhook request/response format
 sideEffects:
 None # => No side effects
 # => Mutation only (no external calls)
 
# Webhook mutation response:
# {
# => Mutation response structure
# "response": {
# => AdmissionReview response
# "uid": "<request-uid>",
# => Matches request UID
# "allowed": true,
# => Must allow for mutation
# "patchType": "JSONPatch",
# => Patch format type
# "patch": "<base64-json-patch>" # => JSON Patch to apply
# => Base64-encoded patch operations
# }
# }
 
# Common webhook use cases:
# => Validation: enforce naming conventions, require labels

Key Takeaway: Use validating webhooks for custom policy enforcement beyond built-in admission controllers; use mutating webhooks for automatic resource modification like sidecar injection; webhook failures block API requests by default (set failurePolicy for control).

Why It Matters: Admission webhooks power policy enforcement at scale — OPA Gatekeeper and Kyverno use webhooks to enforce policies across entire clusters, preventing misconfigurations before they reach production (rejecting Pods without resource limits, enforcing image signing requirements, blocking privileged containers). Mutating webhooks enable platform teams to inject cross-cutting concerns (logging sidecars, monitoring agents, secrets managers) automatically, reducing boilerplate in application manifests. This pattern implements policy-as-code required by compliance frameworks (SOC 2, PCI-DSS), replacing manual code reviews with automated enforcement that validates 100% of deployments in milliseconds.

Helm Charts (Examples 74-78)

Example 74: Basic Helm Chart Structure

Helm packages Kubernetes manifests into charts with templating, versioning, and dependency management. Charts enable reusable application definitions.

%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
%% Helm chart structure
graph TD
 A[Helm Chart] --> B[Chart.yaml<br/>metadata]
 A --> C[values.yaml<br/>defaults]
 A --> D[templates/<br/>manifests]
 D --> E[deployment.yaml]
 D --> F[service.yaml]
 D --> G[ingress.yaml]
 A --> H[charts/<br/>dependencies]
 
 style A fill:#0173B2,stroke:#000000,stroke-width:2px,color:#fff
 style B fill:#DE8F05,stroke:#000000,stroke-width:2px,color:#fff
 style C fill:#CA9161,stroke:#000000,stroke-width:2px,color:#fff
 style D fill:#029E73,stroke:#000000,stroke-width:2px,color:#fff

# Chart directory structure
# => Helm chart directory layout
my-app/ # => Chart root directory
 # => Contains all chart files
├── Chart.yaml # => Chart metadata
 # => Chart name, version, description
├── values.yaml # => Default configuration values
 # => Base values for all environments
├── templates/ # => Kubernetes manifest templates
 # => Go template files (.yaml)
│ ├── deployment.yaml # => Deployment template
│ # => Creates Kubernetes Deployment
│ ├── service.yaml # => Service template
│ # => Creates Kubernetes Service
│ ├── ingress.yaml # => Ingress template
│ # => Optional Ingress resource
│ └── _helpers.tpl # => Template helpers
│ # => Reusable template functions
│ # => Named templates for labels, names
└── charts/ # => Dependency charts
 # => Subcharts from dependencies
 # => Populated by helm dependency update
 
# Chart.yaml
# => Chart metadata file
apiVersion: v2 # => Chart API version
 # => v2 for Helm 3
name: my-app # => Chart name
 # => Used for chart identification
description: A Helm chart for my application # => Chart description
 # => Human-readable summary
type: application # => Chart type
 # => application or library
version: 1.0.0 # => Chart version
 # => Semantic versioning
appVersion: "1.24" # => Application version
 # => Version of deployed application
keywords: # => Search keywords
 # => For chart repositories
- application # => Keyword: application
 # => Generic application tag
- web # => Keyword: web
 # => Web application category
maintainers: # => Chart maintainers
 # => Contact information
- name: Platform Team # => Maintainer name
 # => Team responsible for chart
 email: [email protected] # => Contact email
 # => Support contact
 
# values.yaml
# => Default values for chart
replicaCount: 3 # => Default replica count
 # => Production default
image: # => Container image settings
 # => Image repository and tag
 repository: nginx # => Image repository
 # => Docker Hub nginx
 tag: "1.24" # => Image tag
 # => Specific version
 pullPolicy: IfNotPresent # => Image pull policy
 # => Pull if not cached locally
service: # => Service configuration
 # => ClusterIP service settings
 type: ClusterIP # => Service type
 # => Internal cluster service
 port: 80 # => Service port
 # => HTTP port 80
ingress: # => Ingress configuration
 # => External access settings
 enabled: false # => Ingress disabled by default
 # => Enable via override
 
# templates/deployment.yaml
# => Deployment template
apiVersion: apps/v1 # => Deployment API version
 # => Stable apps API
kind: Deployment # => Resource type
 # => Manages replica sets
metadata: # => Deployment metadata
 # => Name and labels
 name: {{ include "my-app.fullname" . }} # => Dynamic name
 # => Template helper function
 labels: # => Deployment labels
 # => Standard label set
 {{- include "my-app.labels" . | nindent 4 }}
 # => Template helper for labels
 # => nindent 4: indent 4 spaces
spec: # => Deployment specification
 # => Desired state
 replicas: {{ .Values.replicaCount }} # => Replica count from values
 # => .Values.replicaCount substitution
 selector: # => Pod selector
 # => Matches pods by labels
 matchLabels: # => Label selector
 # => Must match pod labels
 {{- include "my-app.selectorLabels" . | nindent 6 }}
 # => Selector label template
 # => Consistent pod selection
 template: # => Pod template
 # => Defines pod spec
 metadata: # => Pod metadata
 # => Pod labels
 labels: # => Pod labels
 # => Must match selector
 {{- include "my-app.selectorLabels" . | nindent 8 }}
 # => Same labels as selector
 # => nindent 8: 8 spaces
 spec: # => Pod specification
 # => Container configuration
 containers: # => Container list
 # => Single container
 - name: {{ .Chart.Name }} # => Container name
 # => From Chart.yaml name field
 image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
 # => Image with tag
 # => Combines repository + tag
 ports: # => Container ports
 # => Exposed ports
 - containerPort: 80 # => HTTP port
 # => Container listens on 80
 
# Helm commands:
# => Common Helm CLI operations

Key Takeaway: Use Helm for repeatable application deployments with configuration management; separate chart version (Chart.yaml) from app version (appVersion); parameterize manifests using values.yaml for environment-specific deployments.

Why It Matters: Helm standardized Kubernetes package management, enabling complex applications with hundreds of resource manifests to be distributed as single installable charts with configurable parameters. This pattern reduces deployment complexity from multi-page kubectl instructions to a single helm install command, dramatically improving adoption. Helm's versioning and rollback capabilities provide production safety — the Artifact Hub hosts thousands of production-ready charts used by organizations worldwide to deploy vetted, community-maintained applications in minutes.

Example 75: Helm Values and Overrides

Helm values provide hierarchical configuration with multiple override mechanisms. Values can be overridden via CLI, files, or --set flags.

%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
%% Helm values precedence
graph TD
 A[values.yaml<br/>defaults] --> B[Merge]
 C[values-dev.yaml<br/>-f flag] --> B
 D[--set replicaCount=5<br/>CLI flag] --> B
 B --> E[Final values]
 E --> F[Template rendering]
 
 style A fill:#CC78BC,stroke:#000000,stroke-width:2px,color:#fff
 style C fill:#CA9161,stroke:#000000,stroke-width:2px,color:#fff
 style D fill:#DE8F05,stroke:#000000,stroke-width:2px,color:#fff
 style E fill:#029E73,stroke:#000000,stroke-width:2px,color:#fff

# values.yaml (default values)
# => Default values for Helm chart
global: # => Global values shared across templates
 # => Available in all templates via .Values.global
 environment: production # => Default environment setting
 # => Used for env-specific configuration
replicaCount: 3 # => Default replica count (production: high availability)
image: # => Container image configuration
 repository: nginx # => Image repository name (Docker Hub nginx)
 tag: "1.24" # => Image tag version (specific for reproducibility)
resources: # => Resource limits and requests
 requests: # => Minimum guaranteed resources (scheduler placement)
 cpu: 100m # => 100 millicores request (0.1 CPU cores)
 memory: 128Mi # => 128 mebibytes request (minimum guarantee)
 limits: # => Maximum allowed resources
 cpu: 200m # => 200 millicores limit (0.2 CPU cores maximum)
 memory: 256Mi # => 256 mebibytes limit (maximum memory cap)
 
# values-dev.yaml (environment override)
# => Development environment overrides
global: # => Override global values (replaces production defaults)
 environment: development # => Dev environment setting (overrides global.environment)
replicaCount: 1 # => Single replica for development (override for dev)
resources: # => Override resource settings (lower limits for dev)
 requests: # => Reduced dev requests (smaller footprint)
 cpu: 50m # => Half of production request (50 millicores)
 memory: 64Mi # => Half production memory (64 MiB minimum)
 
# templates/deployment.yaml
# => Helm template for Deployment
apiVersion: apps/v1 # => Deployment API version
 # => Stable apps API
kind: Deployment # => Deployment resource type
 # => Manages replica sets
metadata: # => Deployment metadata
 name: {{ .Release.Name }}-app # => Dynamic name from release (.Release.Name injected by Helm)
spec: # => Deployment specification (desired state)
 replicas: {{ .Values.replicaCount }} # => Replica count from values (Production: 3, Dev: 1)
 template: # => Pod template
 spec: # => Pod spec
 containers: # => Container list (single container)
 - name: app # => Main application container name
 image: {{ .Values.image.repository }}:{{ .Values.image.tag }}
 # => Dynamic image reference (combines repository + tag)
 env: # => Environment variables
 - name: ENVIRONMENT # => Standard env var for app config
 value: {{ .Values.global.environment }}
 # => Value from global settings (production or development)
 resources: # => CPU and memory settings
 {{- toYaml .Values.resources | nindent 10 }}
 # => Converts values to YAML (nindent 10: indent 10 spaces)
 
# Helm install with overrides:
# => helm install dev-release my-app -f values-dev.yaml
 
# Override via --set flag:
# => helm install prod-release my-app --set replicaCount=5
 
# Override precedence (highest to lowest):
# => Helm value precedence hierarchy
# 1. --set flags
# => CLI --set has highest priority
# 2. -f values-file.yaml (last file wins)
# => Multiple -f files merge left-to-right
# 3. values.yaml (default)
# => Lowest priority: base defaults

Key Takeaway: Use values.yaml for defaults and environment-specific values files for overrides; leverage --set for one-off changes; understand value precedence to predict final configuration.

Why It Matters: Values-based configuration enables the same Helm chart to deploy across development, staging, and production environments with different resource allocations, database endpoints, and scaling settings. Managing large numbers of Helm releases across environments using parameterized values files implements DRY (Don't Repeat Yourself) principles, reducing configuration drift and copy-paste errors that cause outages. Helm's --set flags enable GitOps workflows where CI/CD pipelines dynamically inject image tags and feature flags during deployment, eliminating hardcoded values that slow release velocity.

Example 76: Helm Chart Dependencies

Helm charts can depend on other charts, enabling composition of complex applications from reusable components.

# Chart.yaml
apiVersion: v2 # => Helm chart API version (v2 = Helm 3)
name: web-application # => Chart name
version: 1.0.0 # => Chart version (SemVer)
dependencies: # => List of chart dependencies (downloaded to charts/ directory)
- name: postgresql # => Dependency chart name (from repository)
 version: 12.1.5 # => Exact version or range (12.x, >=12.0.0 <13.0.0)
 repository: https://charts.bitnami.com/bitnami # => Chart repository URL
 # => Run: helm repo add bitnami https://charts.bitnami.com/bitnami
 condition: postgresql.enabled # => Only install if values.postgresql.enabled=true
 # => Enables optional dependencies per environment
- name: redis
 version: 17.9.3 # => Redis chart version from Bitnami
 repository: https://charts.bitnami.com/bitnami # => Same Bitnami repository
 condition: redis.enabled # => Conditional: disabled in values.yaml below
 
# values.yaml
postgresql:
 enabled: true # => Enable PostgreSQL dependency (condition: postgresql.enabled)
 auth: # => PostgreSQL authentication configuration
 username: myapp # => Database username
 password: changeme # => Password (use Secrets in production)
 database: myapp_db # => Database name to create
 primary:
 persistence:
 size: 10Gi # => PVC size for PostgreSQL data
 
redis:
 enabled: false # => Disable Redis (condition: redis.enabled = false → chart not installed)
 
# Update dependencies (downloads charts to charts/ directory):
# => helm dependency update                  # Downloads/updates all dependencies
# => helm dependency list                    # Shows dependency status
 
# Chart.lock (generated by helm dependency update)
dependencies:
- name: postgresql
 repository: https://charts.bitnami.com/bitnami
 version: 12.1.5 # => Exact locked version (reproducible builds)
- name: redis
 repository: https://charts.bitnami.com/bitnami
 version: 17.9.3 # => Locked even when disabled (ensures reproducibility)
 
# Install with dependencies:
# => helm install my-release web-application  # Installs chart + postgresql (redis disabled)
 
# Override dependency values:
# => helm install my-release web-application --set postgresql.auth.password=newsecret
# => Prefix with dependency name (postgresql.) to override dependency values

Key Takeaway: Use chart dependencies for composable applications; lock dependency versions with Chart.lock for reproducible deployments; use conditions to enable/disable optional dependencies per environment.

Why It Matters: Helm dependencies enable all-in-one application packaging where a single chart installs the application plus required infrastructure (databases, caches, message queues), critical for development environments and demos. Companies like Bitnami package 200+ production-grade charts (WordPress, PostgreSQL, Redis) that are downloaded millions of times monthly. This pattern also enables library charts (common, _helpers) that share reusable templates across organization charts, implementing DRY principles and ensuring consistent labeling, annotations, and security policies across 100+ microservices.

Example 77: Helm Hooks

Helm hooks run Jobs at specific points in release lifecycle, enabling pre/post-install tasks like database migrations or cleanup.

%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
%% Helm hook execution
graph TD
 A[helm upgrade] --> B[pre-upgrade hook<br/>weight=1]
 B --> C[Run migration Job]
 C --> D{Migration success?}
 D -->|Yes| E[Deploy main resources]
 D -->|No| F[Upgrade fails]
 E --> G[post-upgrade hook]
 G --> H[Cleanup old resources]
 
 style A fill:#0173B2,stroke:#000000,stroke-width:2px,color:#fff
 style B fill:#DE8F05,stroke:#000000,stroke-width:2px,color:#fff
 style C fill:#CA9161,stroke:#000000,stroke-width:2px,color:#fff
 style E fill:#029E73,stroke:#000000,stroke-width:2px,color:#fff
 style F fill:#CC78BC,stroke:#000000,stroke-width:2px,color:#fff

# templates/db-migration-job.yaml
# => Helm hook Job template
apiVersion: batch/v1 # => Job API version (Batch API for one-time tasks)
kind: Job # => Job resource type (runs to completion)
metadata: # => Job metadata (name and hook annotations)
 name: {{ .Release.Name }}-migration # => Dynamic Job name (release-name-migration)
 annotations: # => Helm hook annotations (control hook behavior)
 "helm.sh/hook": pre-upgrade # => Run before upgrade (blocks upgrade until completion)
 "helm.sh/hook-weight": "1" # => Hook execution order (lower weights run first)
 "helm.sh/hook-delete-policy": before-hook-creation
 # => Delete previous hook Job (prevents Job accumulation)
spec: # => Job specification
 template: # => Pod template (defines migration Pod)
 spec: # => Pod spec
 restartPolicy: Never # => Never restart failed Pods (Job handles failures)
 containers: # => Container list
 - name: migrate # => Container name
 image: {{ .Values.image.repository }}:{{ .Values.image.tag }}
 # => Same image as main app (contains migration scripts)
 command: # => Override container command
 - /bin/sh # => Shell command (execute shell script)
 - -c # => Command string flag (next argument is script)
 - | # => Multiline script
 echo "Running database migrations.."
 # => Log migration start
 ./migrate.sh
 # => Execute migration script (script in container image)
 echo "Migrations complete"
 # => Log migration completion
 
# Available hooks:
# => Helm lifecycle hooks
 
# Hook execution order:
# => Hook weight determines execution sequence
# 1. Hooks with lowest weight execute first
# => weight: -5 runs before weight: 0
# 2. Hooks with same weight execute in alphabetical order
# => Filename sorting for tied weights
# 3. Main resources deployed after hooks succeed
# => Hooks must complete successfully
 
# Hook deletion policies:
# => Control when hooks are cleaned up

Key Takeaway: Use Helm hooks for lifecycle tasks like database migrations, backups, or cleanup; set appropriate hook-delete-policy to prevent accumulation of hook resources; verify hook success before main release continues.

Why It Matters: Helm hooks automate operational tasks that traditionally required manual coordination during deployments—database migrations, schema updates, data seeding, and cleanup. Companies like GitLab and Shopify use pre-upgrade hooks to run database migrations atomically with application deployments, preventing schema/code mismatches. This pattern enables zero-downtime deployments where migrations run automatically while old Pods continue serving traffic, then new Pods start after migration completes. Hooks also implement cleanup automation (pre-delete hooks remove external resources like S3 buckets, DNS records) preventing resource leaks.

Example 78: Helm Tests

Helm tests validate release deployments using Pods with test annotations. Tests ensure applications are healthy after deployment.

# templates/tests/test-connection.yaml
# => Helm test Pod template
apiVersion: v1 # => Core v1 API (Pod API version)
kind: Pod # => Test runs as Pod
metadata: # => Pod metadata
 name: {{ .Release.Name }}-test-connection # => Dynamic test Pod name (release-name-test-connection)
 annotations: # => Helm test annotation (marks as test resource)
 "helm.sh/hook": test # => Test hook (runs on helm test, not during install/upgrade)
spec: # => Pod specification
 restartPolicy: Never # => Run once, preserve exit code for test result
 containers: # => Container list
 - name: wget # => Container name
 image: busybox:1.36 # => Lightweight test image (BusyBox with wget utility)
 command: # => Test command
 - wget # => HTTP client utility
 - --spider # => HEAD request only (check URL without downloading)
 # => Verifies URL reachable
 - {{ .Release.Name }}-service:{{ .Values.service.port }}
 # => Service URL with port (tests internal DNS + connectivity)
 # => Example: my-release-service:80
 
# Run tests:
# => Execute Helm tests for release
 
# Example test output:
# => Successful test execution output
 
# Common test scenarios:
# => Typical Helm test use cases

Key Takeaway: Implement Helm tests to validate releases post-deployment; include tests in CI/CD pipelines to catch deployment issues early; tests provide confidence in production rollouts.

Why It Matters: Helm tests implement deployment validation that catches configuration errors before they impact users — detecting issues like wrong database credentials, missing ConfigMaps, and network policy misconfigurations in microservice deployments. This pattern enables progressive delivery where deployments automatically roll back if health checks fail, reducing mean-time-to-recovery (MTTR) from hours (manual detection) to seconds (automated tests). Helm tests also document expected behavior, serving as executable specifications that ensure deployments meet functional requirements defined by QA teams.

GitOps & ArgoCD (Examples 79-83)

Example 79: GitOps Principles and Repository Structure

GitOps uses Git as single source of truth for declarative infrastructure and applications. Changes committed to Git trigger automatic synchronization to clusters.

%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
%% GitOps workflow
graph TD
 A[Developer commits<br/>to Git] --> B[CI builds image]
 B --> C[CI updates manifest<br/>with new image tag]
 C --> D[Push to Git repo]
 D --> E[ArgoCD detects change]
 E --> F[ArgoCD syncs cluster]
 F --> G[Deployment updated]
 
 style A fill:#0173B2,stroke:#000000,stroke-width:2px,color:#fff
 style B fill:#DE8F05,stroke:#000000,stroke-width:2px,color:#fff
 style E fill:#CA9161,stroke:#000000,stroke-width:2px,color:#fff
 style G fill:#029E73,stroke:#000000,stroke-width:2px,color:#fff

# GitOps repository structure
# => Typical GitOps repository layout
gitops-repo/ # => Git repository root
 # => Single source of truth
├── apps/ # => Application manifests
 # => Per-environment application configs
│ ├── production/ # => Production environment
│ # => Production-specific configurations
│ │ ├── app1/ # => Application 1
│ │ # => app1 manifests
│ │ │ ├── deployment.yaml # => Deployment manifest
│ │ │ # => Production Deployment config
│ │ │ ├── service.yaml # => Service manifest
│ │ │ # => Service definition
│ │ │ └── kustomization.yaml # => Kustomize configuration
│ │ │ # => Overlay customizations
│ │ └── app2/ # => Application 2
│ │ # => app2 manifests
│ └── staging/ # => Staging environment
│ # => Staging-specific configs
│ └── app1/ # => app1 staging version
│ # => Lower resources, different config
├── infrastructure/ # => Infrastructure resources
 # => Cluster-wide infrastructure
│ ├── namespaces/ # => Namespace definitions
│ # => All cluster namespaces
│ ├── rbac/ # => RBAC policies
│ # => Roles, RoleBindings
│ └── monitoring/ # => Monitoring stack
│ # => Prometheus, Grafana, etc.
└── clusters/ # => Cluster configurations
 # => Multi-cluster management
 ├── prod-cluster/ # => Production cluster
 # => Prod cluster apps
 │ └── apps/ # => App registration
 │ # => ArgoCD Application CRDs
 │ └── app1.yaml # => ArgoCD Application manifest
 │ # => Points ArgoCD to apps/production/app1
 └── staging-cluster/ # => Staging cluster
 # => Staging cluster configs
 
# GitOps workflow:
# => GitOps deployment pipeline
# 1. Developer commits to Git
# => Code or config change pushed
# 2. CI builds image, updates manifest with new tag
# => CI updates deployment.yaml with new image tag
# 3. GitOps operator (ArgoCD, Flux) detects change
# => Operator polls Git repository
# 4. Operator syncs cluster state to match Git
# => Applies changed manifests to cluster
# 5. Application deployed automatically
# => New Pods created with new image
 
# GitOps benefits:
# => Advantages of GitOps approach
 
# GitOps principles:
# => Core principles of GitOps
# 1. Declarative: desired state in Git
# => YAML manifests define desired state
# 2. Versioned: Git provides versioning
# => Every change is a Git commit
# 3. Immutable: Git commits are immutable
# => Commits cannot be changed (only reverted)
# 4. Automated: reconciliation loop syncs cluster
# => Continuous sync without human intervention
# 5. Auditable: Git log provides audit trail
# => Who changed what and when

Key Takeaway: GitOps treats Git as single source of truth for cluster state; separate application manifests by environment (production, staging); use GitOps operators like ArgoCD for continuous synchronization and drift detection.

Why It Matters: GitOps transforms operational practices by reducing deployment errors through mandatory peer review and automated drift detection. This pattern enforces infrastructure-as-code principles where all changes are versioned, auditable, and reversible — critical for compliance (SOC 2 requires audit trails for infrastructure changes). GitOps also enables disaster recovery through Git — entire Kubernetes clusters can be rebuilt from Git repositories in minutes, meeting recovery-time-objective (RTO) requirements for regulated workloads. The pull-request workflow democratizes infrastructure changes while maintaining control.

Example 80: ArgoCD Installation and Configuration

ArgoCD is a declarative GitOps continuous delivery tool for Kubernetes. ArgoCD monitors Git repositories and synchronizes applications to clusters.

# Install ArgoCD:
# => ArgoCD installation steps
 
# Access ArgoCD UI:
# => Port-forward to ArgoCD server
 
# Get admin password:
# => Retrieve initial admin password
 
# ArgoCD Application CRD
# => ArgoCD Application resource
apiVersion:
 argoproj.io/v1alpha1 # => ArgoCD API version
 # => ArgoCD custom resource API
kind:
 Application # => Application resource type
 # => ArgoCD-specific CRD
metadata: # => Application metadata
 # => Name and namespace
 name:
 my-app # => Application name
 # => Identifies this ArgoCD app
 namespace:
 argocd # => ArgoCD namespace
 # => Applications live in argocd namespace
spec: # => Application specification
 # => Git source and sync configuration
 project:
 default # => ArgoCD project
 # => Logical grouping for apps
 # => default: built-in project
 source: # => Git repository source
 # => Defines where manifests live
 repoURL:
 https://github.com/example/gitops-repo # => Git repository URL
 # => HTTPS Git URL
 # => ArgoCD polls this repo
 targetRevision:
 HEAD # => Git branch/tag/commit
 # => HEAD: track branch tip
 # => Can use tag or commit SHA
 path:
 apps/production/app1 # => Path in repository
 # => Subdirectory with manifests
 # => ArgoCD applies manifests here
 destination: # => Target cluster and namespace
 # => Where to deploy resources
 server:
 https://kubernetes.default.svc # => Target cluster
 # => kubernetes.default.svc: in-cluster
 # => Can target external clusters
 namespace:
 production # => Target namespace
 # => Deploy resources to production namespace
 syncPolicy: # => Synchronization policy
 # => Automated sync configuration
 automated: # => Automated sync settings
 # => Automatic deployment on Git changes
 prune:
 true # => Delete resources not in Git
 # => Remove orphaned resources
 # => Keeps cluster clean
 selfHeal:
 true # => Auto-sync on drift
 # => Revert manual changes
 # => Ensures Git is source of truth
 syncOptions: # => Sync options
 # => Additional sync behaviors
 - CreateNamespace=true # => Auto-create namespace
 # => Creates namespace if missing
 # => No manual namespace creation needed
 
# Apply Application:
# => Create ArgoCD Application resource
 
# ArgoCD CLI:
# => ArgoCD command-line operations

Key Takeaway: Install ArgoCD for GitOps-based deployment automation; define Applications pointing to Git repositories; enable automated sync with selfHeal for drift correction; ArgoCD provides UI and CLI for application management.

Why It Matters: ArgoCD automates deployment workflows, eliminating manual kubectl commands that cause production incidents through wrong context, typos, and outdated manifests. Automated self-heal prevents configuration drift where manual edits diverge from Git, a major source of "works on my cluster" debugging sessions. ArgoCD's multi-cluster management enables platform teams to manage many Kubernetes clusters from a single control plane, ensuring consistent security policies and application versions across development, staging, and production environments.

Example 81: ArgoCD Sync Strategies

ArgoCD supports multiple sync strategies controlling how and when applications synchronize from Git to cluster.

%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
%% ArgoCD sync with self-heal
graph TD
 A[Git commit] --> B[ArgoCD detects change]
 B --> C[Auto-sync enabled?]
 C -->|Yes| D[Sync to cluster]
 C -->|No| E[Wait for manual sync]
 D --> F[Drift detected?]
 F -->|Yes + selfHeal| G[Auto-correct drift]
 F -->|No selfHeal| H[Alert only]
 G --> I[Cluster matches Git]
 
 style B fill:#0173B2,stroke:#000000,stroke-width:2px,color:#fff
 style D fill:#DE8F05,stroke:#000000,stroke-width:2px,color:#fff
 style G fill:#029E73,stroke:#000000,stroke-width:2px,color:#fff
 style I fill:#029E73,stroke:#000000,stroke-width:2px,color:#fff

apiVersion:
 argoproj.io/v1alpha1 # => ArgoCD Application API
 # => ArgoCD CRD version
kind:
 Application # => Application resource
 # => Defines sync behavior
metadata: # => Application metadata
 # => Name and namespace
 name:
 sync-strategies-demo # => Application name
 # => Demo application
 namespace:
 argocd # => ArgoCD namespace
 # => All apps in argocd namespace
spec: # => Application specification
 # => Git source and sync policy
 source: # => Git repository source
 # => Where manifests live
 repoURL:
 https://github.com/example/gitops-repo # => Git repository
 # => Source of truth
 path:
 apps/production/demo # => Path in repo
 # => Manifest subdirectory
 destination: # => Target cluster/namespace
 # => Where to deploy
 server:
 https://kubernetes.default.svc # => Cluster API
 # => In-cluster deployment
 namespace:
 production # => Target namespace
 # => Deploy to production
 syncPolicy: # => Synchronization policy
 # => Automated sync configuration
 automated: # => Automated sync settings
 # => Auto-sync configuration
 prune:
 true # => Delete resources removed from Git
 # => Orphan resource cleanup
 # => If resource deleted from Git, delete from cluster
 selfHeal:
 true # => Auto-sync on drift detection
 # => Revert manual changes
 # => Continuous reconciliation
 allowEmpty:
 false # => Prevent deleting all resources
 # => Safety check
 # => Blocks sync if Git path empty
 syncOptions: # => Sync options
 # => Additional sync behaviors
 - Validate=true # => Validate manifests before sync
 # => Dry-run validation first
 # => Prevents invalid YAML
 - CreateNamespace=true # => Auto-create namespace
 # => Creates namespace if missing
 - PrunePropagationPolicy=foreground # => Delete dependents before owner
 # => Foreground cascading deletion
 # => Wait for dependents to delete first
 - PruneLast=true # => Prune after applying
 # => Apply resources first, then prune
 # => Prevents downtime from premature deletion
 retry: # => Retry configuration
 # => Handles transient failures
 limit:
 5 # => Retry limit on sync failure
 # => Maximum 5 retry attempts
 # => Prevents infinite loops
 backoff: # => Backoff configuration
 # => Exponential backoff settings
 duration:
 5s # => Initial retry delay
 # => First retry after 5 seconds
 factor:
 2 # => Exponential backoff
 # => Each retry doubles duration
 # => 5s, 10s, 20s, 40s, 80s..
 maxDuration:
 3m # => Maximum retry delay
 # => Cap at 3 minutes
 # => Prevents excessive waiting
 
# Sync strategies:
# => Different ArgoCD sync approaches
# 1. Manual sync (no automated section)
# => No automated sync policy
 
# 2. Automated sync (automated: {})
# => Basic automated sync
 
# 3. Automated with selfHeal
# => Automated sync with drift correction
 
# 4. Automated with prune
# => Automated sync with cleanup

Key Takeaway: Use automated sync with selfHeal for production to prevent configuration drift; enable prune to remove orphaned resources; configure retry with backoff for resilience against transient failures.

Why It Matters: Sync strategies balance deployment automation (velocity) with safety (preventing outages). Automated ArgoCD sync with self-heal enabled across production applications eliminates configuration drift incidents by continuously reconciling cluster state with the Git source of truth. Sync waves implement ordered deployment patterns critical for stateful applications — deploying dependent services before their dependencies is a common source of rollback events without explicit ordering.

Example 82: ArgoCD Projects and RBAC

ArgoCD Projects provide logical grouping, access control, and restrictions for Applications. Projects enforce source repositories, destination clusters, and allowed resource types.

apiVersion:
 argoproj.io/v1alpha1 # => AppProject API
 # => ArgoCD project CRD
kind:
 AppProject # => AppProject resource
 # => Project definition for grouping apps
metadata: # => Project metadata
 # => Name and namespace
 name:
 production-project # => Project name
 # => Logical grouping name
 namespace:
 argocd # => ArgoCD namespace
 # => Projects live in argocd namespace
spec: # => Project specification
 # => Access control and restrictions
 description:
 Production applications # => Project description
 # => Human-readable summary
 sourceRepos: # => Allowed Git repositories
 # => Whitelist of allowed repos
 - https://github.com/example/gitops-repo # => Allowed source repositories
 # => Applications can only use this repo
 destinations: # => Allowed deployment targets
 # => Whitelist of clusters/namespaces
 - namespace:
 production # => Allowed namespace
 # => Applications can deploy to production
 server:
 https://kubernetes.default.svc # => Allowed destination clusters/namespaces
 # => In-cluster server URL
 # => Applications cannot deploy elsewhere
 clusterResourceWhitelist: # => Allowed cluster-scoped resources
 # => Whitelist for cluster resources
 - group: "" # => Core API group
 # => Empty string = core API
 kind:
 Namespace # => Allow Namespace creation
 # => Can create namespaces
 - group:
 "rbac.authorization.k8s.io" # => RBAC API group
 # => RBAC resources
 kind:
 ClusterRole # => Allow ClusterRole (restricted)
 # => Can create ClusterRoles
 # => Restricted cluster-wide access
 namespaceResourceBlacklist: # => Denied namespace-scoped resources
 # => Blacklist prevents specific resources
 - group:
 "" # => Core API group
 # => Core resources
 kind:
 ResourceQuota # => Deny ResourceQuota changes
 # => Cannot modify resource quotas
 # => Prevents quota bypass
 roles: # => Project RBAC roles
 # => Custom role definitions
 - name:
 developer # => Role name
 # => Developer role
 description:
 Developer access # => Role description
 # => Human-readable purpose
 policies: # => RBAC policies
 # => Permission statements
 - p, proj:production-project:developer, applications, get, production-project/*, allow
 # => Allow get (view) applications
 # => Policy format: p, subject, resource, action, object, effect
 - p, proj:production-project:developer, applications, sync, production-project/*, allow # => Developers can view and sync apps
 # => Allow sync applications
 # => Cannot delete or override
 groups: # => SSO group mapping
 # => Map to external identity groups
 - developers # => Map to SSO group
 # => GitHub/OIDC "developers" group
 
---
apiVersion:
 argoproj.io/v1alpha1 # => Application API
 # => ArgoCD Application CRD
kind:
 Application # => Application resource
 # => Uses project restrictions
metadata: # => Application metadata
 # => Name and namespace
 name:
 prod-app # => Application name
 # => Production application
 namespace:
 argocd # => ArgoCD namespace
 # => Applications in argocd namespace
spec: # => Application specification
 # => Git source and project
 project:
 production-project # => Uses production project
 # => Inherits project restrictions
 # => Must comply with project rules
 source: # => Git source
 # => Repository configuration
 repoURL:
 https://github.com/example/gitops-repo # => Git repository
 # => Must be in project sourceRepos
 path:
 apps/production/app1 # => Path in repository
 # => Manifest directory
 destination: # => Deployment target
 # => Cluster and namespace
 server:
 https://kubernetes.default.svc # => Target cluster
 # => Must be in project destinations
 namespace:
 production # => Target namespace
 # => Must be in project destinations
 
# Project restrictions:
# => How projects enforce policies

Key Takeaway: Use ArgoCD Projects for multi-team environments to enforce repository, cluster, and resource restrictions; configure RBAC roles for granular access control; projects prevent accidental changes to critical resources.

Why It Matters: ArgoCD Projects implement multi-tenancy for large engineering organizations, enabling many development teams to share ArgoCD while maintaining security boundaries that prevent team A from deploying to team B's namespaces. This pattern enforces least-privilege access where frontend teams cannot modify backend infrastructure, reducing blast radius of misconfigurations. Project-based RBAC also implements compliance requirements (SOX, PCI-DSS) where auditors verify that developers cannot modify production resources directly, only through Git + peer review + automated deployment. This segregation of duties is mandatory for regulated industries.

Example 83: ArgoCD ApplicationSets

ApplicationSets generate multiple Applications from templates, enabling fleet management and multi-cluster deployments from single definition.

%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
%% ApplicationSet multi-cluster deployment
graph TD
 A[ApplicationSet] --> B[Generator: list]
 B --> C[Cluster: prod-us-west]
 B --> D[Cluster: prod-eu-central]
 B --> E[Cluster: staging]
 C --> F[Application: prod-us-west-app]
 D --> G[App: prod-eu-central]
 E --> H[Application: staging-app]
 
 style A fill:#0173B2,stroke:#000000,stroke-width:2px,color:#fff
 style B fill:#DE8F05,stroke:#000000,stroke-width:2px,color:#fff
 style F fill:#029E73,stroke:#000000,stroke-width:2px,color:#fff
 style G fill:#029E73,stroke:#000000,stroke-width:2px,color:#fff
 style H fill:#029E73,stroke:#000000,stroke-width:2px,color:#fff

apiVersion:
 argoproj.io/v1alpha1 # => ApplicationSet API
 # => ArgoCD ApplicationSet CRD
kind:
 ApplicationSet # => ApplicationSet resource
 # => Application template generator
metadata: # => ApplicationSet metadata
 # => Name and namespace
 name:
 cluster-apps # => ApplicationSet name
 # => Multi-cluster deployment
 namespace:
 argocd # => ArgoCD namespace
 # => ApplicationSets in argocd namespace
spec: # => ApplicationSet specification
 # => Generators and template
 generators: # => Generator list
 # => Defines how to generate Applications
 - list: # => List generator
 # => Static list of parameters
 elements: # => Generator elements
 # => List of cluster configurations
 - cluster:
 prod-us-west # => Cluster parameter
 # => US West production cluster
 url:
 https://prod-us-west.k8s.example.com # => Cluster URL
 # => API server endpoint
 - cluster:
 prod-eu-central # => EU cluster
 # => EU Central production cluster
 url:
 https://prod-eu-central.k8s.example.com # => Cluster URL
 # => EU API endpoint
 - cluster:
 staging # => Staging cluster
 # => Staging environment
 url:
 https://staging.k8s.example.com # => Cluster URL
 # => Staging API endpoint
 template: # => Application template
 # => Template for generated Applications
 metadata: # => Application metadata template
 # => Name uses generator variables
 name:
 "{{cluster}}-app" # => Generated name
 # => Template variable: {{cluster}}
 # => Example: prod-us-west-app
 spec: # => Application spec template
 # => Source and destination
 project:
 default # => ArgoCD project
 # => Use default project
 source: # => Git source
 # => Repository configuration
 repoURL:
 https://github.com/example/gitops-repo # => Git repository
 # => Same repo for all clusters
 path:
 "apps/{{cluster}}" # => Cluster-specific path
 # => Template variable: {{cluster}}
 # => Example: apps/prod-us-west
 targetRevision:
 HEAD # => Git revision
 # => Track branch tip
 destination: # => Deployment target
 # => Cluster and namespace
 server:
 "{{url}}" # => Cluster URL from generator
 # => Template variable: {{url}}
 # => Example: https://prod-us-west.k8s.example.com
 namespace:
 default # => Target namespace
 # => Deploy to default namespace
 syncPolicy: # => Sync policy
 # => Automated sync
 automated: # => Automated sync settings
 # => Auto-sync configuration
 prune:
 true # => Delete orphaned resources
 # => Cleanup enabled
 selfHeal:
 true # => Auto-correct drift
 # => Continuous reconciliation
 
# ApplicationSet generators:
# => Different generator types
# 1. List generator (static list)
# => Hardcoded list of parameters (example above)
# 2. Cluster generator (auto-discover clusters)
# => Discovers clusters registered in ArgoCD
# 3. Git generator (generate from Git directory structure)
# => Reads Git directory/file structure to generate apps
# 4. Matrix generator (combine multiple generators)
# => Combines two generators (cartesian product)
 
# Use cases:
# => Common ApplicationSet patterns
 
# Git directory generator example:
# => Git generator configuration
# generators:
# => Generator list
# - git:
# # => Git generator
# repoURL: https://github.com/example/gitops-repo
# # => Git repository URL
# revision: HEAD
# # => Git revision to scan
# directories:
# # => Directory pattern matching
# - path: apps/*
# # => Match all apps/* directories
# # => Creates one Application per directory

Key Takeaway: Use ApplicationSets for fleet management and multi-cluster deployments; leverage generators to avoid manual Application creation; ApplicationSets enable GitOps at scale for multiple environments and clusters.

Why It Matters: ApplicationSets eliminate configuration duplication by generating multiple Application resources from a single template — one ApplicationSet can replace dozens of nearly-identical Application manifests across environments or clusters. The Pull Request generator enables automated preview environments where every PR gets an isolated test environment, improving QA feedback loops from days to minutes. Multi-cluster ApplicationSets power disaster recovery and multi-region deployments by automatically deploying the same application to many clusters with environment-specific configuration, critical for high-availability SaaS platforms.

Production Patterns (Examples 84-85)

Example 84: Production Monitoring and Observability

Production clusters require comprehensive monitoring, logging, and tracing for observability. Combine Prometheus, Grafana, and Loki for full-stack visibility.

%% Color Palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161
%% Observability stack
graph TD
 A[Application Pods] --> B[Metrics endpoint /metrics]
 A --> C[Logs to stdout/stderr]
 A --> D[Traces with context]
 B --> E[Prometheus scrapes]
 C --> F[Loki aggregates]
 D --> G[Jaeger collects]
 E --> H[Grafana dashboards]
 F --> H
 G --> H
 
 style A fill:#0173B2,stroke:#000000,stroke-width:2px,color:#fff
 style E fill:#DE8F05,stroke:#000000,stroke-width:2px,color:#fff
 style F fill:#CA9161,stroke:#000000,stroke-width:2px,color:#fff
 style G fill:#CC78BC,stroke:#000000,stroke-width:2px,color:#fff
 style H fill:#029E73,stroke:#000000,stroke-width:2px,color:#fff

# Prometheus Operator (monitoring)
# => kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/bundle.yaml
# => kubectl get pods -n monitoring          # => prometheus-operator-xxx running
 
apiVersion: monitoring.coreos.com/v1 # => Prometheus Operator CRD API
kind: ServiceMonitor # => Instructs Prometheus to scrape matching Services
metadata: # => ServiceMonitor metadata
 name: app-monitor # => ServiceMonitor name
 # => Must be in same namespace as Prometheus (or allowNamespaces configured)
spec: # => ServiceMonitor specification
 selector: # => Service selection criteria
 matchLabels: # => Label-based Service selector
 app: myapp # => Selects Services with label app=myapp
 # => Service must have this label for auto-discovery
 endpoints: # => Scrape endpoint configuration
 - port: metrics # => Service port name to scrape (named port)
 # => Must match port name in Service spec
 interval: 30s # => Scrape metrics every 30 seconds
 # => Default: 1m. Shorter intervals = more storage + CPU
 path: /metrics # => HTTP path for Prometheus metrics (Prometheus format)
 # => Application must expose Prometheus-compatible metrics here
 
---
# Application with metrics endpoint
apiVersion: v1 # => Core Kubernetes API
kind: Service # => Service exposing metrics port for ServiceMonitor
metadata: # => Service metadata
 name: myapp-service # => Service name
 labels: # => Service labels
 app: myapp # => Label matches ServiceMonitor selector above
 # => Without this label, ServiceMonitor won't discover this Service
spec: # => Service specification
 selector: # => Pod selector
 app: myapp # => Routes to Pods with app=myapp label
 ports: # => Port definitions
 - name: metrics # => Named port (matches ServiceMonitor endpoints.port)
 # => Name enables ServiceMonitor to find correct port
 port: 9090 # => Service port (external)
 targetPort: 9090 # => Container port (where /metrics is exposed)
 
---
# Grafana Dashboard ConfigMap
apiVersion: v1 # => Core Kubernetes API
kind: ConfigMap # => Stores Grafana dashboard JSON definition
metadata: # => ConfigMap metadata
 name: grafana-dashboard # => ConfigMap name
 labels: # => ConfigMap labels
 grafana_dashboard: "1" # => Grafana sidecar watches for this label
 # => Automatically imports dashboard when ConfigMap created
data: # => ConfigMap data
 app-dashboard.json: | # => Key is filename, value is dashboard JSON
 {
 "dashboard": {
 "title": "Application Metrics", # => Dashboard title in Grafana UI
 "panels": [
 {
 "title": "Request Rate", # => Panel title
 "targets": [
 {
 "expr": "rate(http_requests_total[5m])"
 # => PromQL query: per-second request rate over 5m window
 }
 ]
 }
 ]
 }
 }
 
# Logging with Loki (install via Helm)
# => helm install loki grafana/loki-stack    # Installs Loki + Promtail (log collector)
# => kubectl logs pod/app-xyz | promtail     # Logs automatically collected from Pod stdout
 
# Tracing with Jaeger
# => kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/main/deploy/crds/jaegertracing.io_jaegers_crd.yaml
# => Instrument application with OpenTelemetry SDK to send traces
 
# Observability pillars:
# => Metrics: Prometheus (time-series data) - WHAT is happening (request rates, error rates, latency)
# => Logs: Loki (log aggregation) - WHAT happened in detail (error messages, stack traces)
# => Traces: Jaeger (distributed tracing) - WHY it happened (which service caused the latency)

Key Takeaway: Implement three pillars of observability (metrics, logs, traces) for production clusters; use Prometheus for metrics, Loki for logs, and Jaeger for traces; create Grafana dashboards for visualization and alerting.

Why It Matters: Comprehensive observability is non-negotiable for production Kubernetes. The shift from reactive (users report outages) to proactive (alerts fire before user impact) requires real-time monitoring that detects anomalies — CPU spikes, memory leaks, error rate increases — before they become visible to users. Distributed tracing (Tempo/Jaeger) is critical for debugging microservices where a single request spans 20+ services. This observability stack implements SRE principles (SLOs, error budgets) required for production reliability at scale.

Example 85: Production Best Practices Checklist

Production Kubernetes clusters require comprehensive setup across security, reliability, monitoring, and operations. This example provides a complete checklist.

# Production Readiness Validation Commands
 
# 1. Security Hardening Verification
kubectl auth can-i "*" "*" --all-namespaces     # => "no" if RBAC properly restricts access
# ✓ Enable RBAC (api-server --authorization-mode=RBAC)
kubectl get networkpolicies -A                  # => Lists all NetworkPolicies across namespaces
# ✓ Use NetworkPolicies for pod-to-pod traffic control
kubectl get namespaces -o json | grep pod-security
# => pod-security.kubernetes.io/enforce: restricted (if PSS enabled)
# ✓ Enable Pod Security Standards (restricted level)
kubectl get pods -A -o json | jq '.items[].spec.containers[].securityContext.runAsNonRoot'
# => Shows "true" for all compliant containers (non-root requirement)
# ✓ Run containers as non-root
 
# 2. High Availability Verification
kubectl get deployments -A -o json | jq '.items[] | select(.spec.replicas < 3) | .metadata.name'
# => Lists Deployments with fewer than 3 replicas (HA risk)
# ✓ Multiple replicas for all Deployments (min 3)
kubectl get pdb -A                              # => Lists all PodDisruptionBudgets
# => ALLOWED-DISRUPTIONS shows current disruption budget
# ✓ PodDisruptionBudgets for all critical services
kubectl get pods -A -o json | jq '[.items[] | select(.spec.containers[].resources.requests == null)] | length'
# => "0" if all Pods have resource requests set
# ✓ Resource requests and limits for all Pods
 
# 3. Resource Management Verification
kubectl get resourcequotas -A                   # => Lists quotas per namespace
# ✓ ResourceQuotas per namespace
kubectl get hpa -A                              # => Lists all HPAs with current/desired replicas
# ✓ HorizontalPodAutoscaler for dynamic scaling
 
# 4. Monitoring and Observability Verification
kubectl get servicemonitors -A 2>/dev/null      # => Lists ServiceMonitors (Prometheus Operator)
# => "No resources found" if Prometheus Operator not installed
# ✓ Prometheus + ServiceMonitors for metrics collection
 
# 5. GitOps and Deployment Verification
kubectl get applications -n argocd 2>/dev/null  # => Lists ArgoCD Applications
# => Shows SYNC STATUS (Synced/OutOfSync) and HEALTH (Healthy/Degraded)
# ✓ ArgoCD for continuous deployment
 
# 6. Backup and Disaster Recovery Verification
kubectl get volumesnapshots -A 2>/dev/null      # => Lists existing VolumeSnapshots
# ✓ VolumeSnapshots for data backup
ETCDCTL_API=3 etcdctl snapshot status /var/lib/etcd-backup.db 2>/dev/null
# => Shows snapshot hash, revision, total keys (etcd backup verification)
# ✓ Backup etcd snapshots daily
 
# 7. Network and Storage Verification
kubectl get ingressclasses                      # => Lists available Ingress classes
# ✓ Ingress with TLS termination configured
kubectl get storageclasses                      # => Lists StorageClasses (default marked with *)
# ✓ StorageClasses with dynamic provisioning
 
# 8. Operations Summary Health Check
kubectl get nodes                               # => All nodes should show STATUS: Ready
kubectl get pods -A | grep -v Running | grep -v Completed
# => Should show empty (no non-Running/Completed pods in production)
kubectl top nodes                               # => CPU and memory utilization per node

Key Takeaway: Production readiness requires comprehensive setup across security, reliability, monitoring, and operations; use this checklist to validate cluster readiness; automate validation using admission controllers and policy engines like OPA/Kyverno.

Why It Matters: Production Kubernetes readiness is not binary — it is a set of practices accumulated across years of real outages and postmortems. This checklist codifies patterns from the Kubernetes community, SRE practices, and the CNCF ecosystem. Automated policy enforcement (OPA Gatekeeper, Kyverno) makes compliance a CI gate rather than a code review checklist — preventing misconfigured workloads from reaching production regardless of who writes the YAML. The SRE principles embedded here (error budgets, SLOs, chaos engineering) operationalize reliability, enabling higher deployment velocity while reducing incident frequency. Production readiness is not optional — it is the difference between experimental clusters and platforms running critical business workloads.

Summary

Advanced level (75-95% coverage) covered:

RBAC & Security (Examples 58-63): ServiceAccounts, Roles, ClusterRoles, SecurityContext, encryption at rest
Network Policies (Examples 64-68): Pod-to-pod traffic control, namespace isolation, egress restrictions, complex selectors
Custom Resources & Operators (Examples 69-73): CRDs, subresources, operator pattern, OLM, admission webhooks
Helm Charts (Examples 74-78): Chart structure, values, dependencies, hooks, tests
GitOps & ArgoCD (Examples 79-83): GitOps principles, ArgoCD installation, sync strategies, projects, ApplicationSets
Production Patterns (Examples 84-85): Monitoring/observability, production best practices

Congratulations! You've completed all 85 examples achieving 95% coverage of production Kubernetes knowledge.

Next steps:

Apply these patterns in production environments
Explore Beginner Examples for fundamentals review
Or review Intermediate Examples for production patterns

All examples are production-ready and battle-tested. Happy Kubernetes mastery!

Advanced

Command Palette