banner
云野阁

云野阁

闲云野鹤,八方逍遥

Kubernetes Knowledge Overview - Core Components

pod#

In Kubernetes, the primary attributes of almost all resources are the same, mainly consisting of five parts:

  1. apiVersion <string> Version, defined internally by Kubernetes, the version number must be queryable using kubectl api-versions.
  2. kind <string> Type, defined internally by Kubernetes, the version number must be queryable using kubectl api-resources.
  3. metadata <object> Metadata, mainly resource identification and description, commonly includes name, namespace, labels, etc.
  4. spec <object> Description, this is the most important part of the configuration, containing detailed descriptions of various resource configurations.
  5. status <object> Status information, the content inside does not need to be defined and is automatically generated by Kubernetes.

Pod Lifecycle#

The time range from the creation to the termination of a Pod object is called the Pod lifecycle. The main processes of its lifecycle are as follows:

  1. Pod creation
  2. Running initialization containers
  3. Running main containers
    (1) Startup hooks, termination hooks
    (2) Liveness probes, readiness probes
  4. Pod termination

0

Throughout its lifecycle, a Pod can be in five states, as follows:

  1. Pending: The apiserver has created the Pod resource object, but it has not yet been scheduled or is still in the process of downloading the image.
  2. Running: The Pod has been scheduled to a node, and all containers have been created by kubelet.
  3. Succeeded: All containers in the Pod have successfully terminated and will not be restarted.
  4. Failed: All containers have terminated, but at least one container terminated with a failure, meaning it returned a non-zero exit status.
  5. Unknown: The apiserver cannot retrieve the status information of the Pod normally, usually due to network communication failure.

Pod Creation Process#

The Pod configuration is transmitted to the apiserver via kubectl. The apiserver converts the Pod information and stores it in etcd, then performs a "handshake" feedback. The scheduler listens for changes in Pod information in the apiserver, uses algorithms to allocate hosts for the Pod, and updates the information in the apiserver. The corresponding node host listens for the updated information, creates the container, and updates the information to the apiserver. The apiserver stores the final information in etcd, and the Pod creation is complete.

  1. The user submits the Pod information to be created to the apiserver via kubectl or other API clients.
  2. The apiserver begins generating the Pod object information and stores it in etcd, then returns confirmation information to the client.
  3. The apiserver begins reflecting changes to the Pod object in etcd, and other components use the watch mechanism to track changes on the apiserver.
  4. The scheduler discovers that a new Pod object needs to be created, begins allocating hosts for the Pod, and updates the result information to the apiserver.
  5. The kubelet on the node discovers that a Pod has been scheduled, attempts to call Docker to start the container, and sends the result back to the apiserver.
  6. The apiserver stores the received Pod status information in etcd.

1

Pod Termination Process#

The user sends a command to delete the Pod, the apiserver accepts and updates the information, and the Pod status changes to terminating. The kubelet listens for this and initiates the Pod shutdown command. The endpoint controller listens for the Pod shutdown command, deletes the corresponding service resource, the Pod stops running, and the kubelet requests the apiserver to set the Pod resource's grace period to 0 to complete the deletion operation. The apiserver stores the final information in etcd, and the Pod deletion is complete.

  1. The user sends a command to the apiserver to delete the Pod object.
  2. The Pod object information in the apiserver will be updated over time, and within the grace period (default 30s), the Pod is considered dead.
  3. The Pod is marked as terminating.
  4. The kubelet starts the Pod shutdown process as soon as it detects that the Pod object has changed to the terminating state.
  5. The endpoint controller removes the Pod object from the endpoint list of all matching service resources when it detects the Pod object's shutdown behavior.
  6. If the current Pod object defines a preStop hook handler, it will be executed synchronously as soon as it is marked as terminating.
  7. The container process in the Pod object receives a stop signal.
  8. After the grace period ends, if there are still running processes in the Pod, the Pod object will receive an immediate termination signal.
  9. The kubelet requests the apiserver to set the grace period of this Pod resource to 0 to complete the deletion operation, at which point the Pod is no longer visible to the user.

Initialization Containers#

Initialization containers are containers that must run before the main container of the Pod starts, mainly to perform some preparatory work for the main container. They have two main characteristics:

  1. Initialization containers must run to completion. If any initialization container fails, Kubernetes needs to restart it until it succeeds.
  2. Initialization containers must execute in the defined order; only when the previous one succeeds can the next one run.
    Initialization containers have many application scenarios; here are the most common ones:
  3. Provide tools or custom code that are not available in the main container image.
  4. Initialization containers must start and run to completion before application containers, so they can be used to delay the start of application containers until their dependencies are met.
    Next, let's do a case to simulate the following requirement:
    Assuming we want to run Nginx as the main container, but we need to be able to connect to the servers where MySQL and Redis are located before running Nginx.
    To simplify testing, we predefine the IP addresses of MySQL and Redis as 192.168.18.103 and 192.168.18.104 (note that these two IPs cannot be pinged because these IPs do not exist in the environment).
    Create a file named pod-initcontainer.yaml with the following content:
apiVersion: v1
kind: Pod
metadata:
  name: pod-initcontainer
  namespace: dev
  labels:
    user: xudaxian
spec:
  containers: # Container configuration
    - name: nginx
      image: nginx:1.17.1
      imagePullPolicy: IfNotPresent
      ports:
        - name: nginx-port
          containerPort: 80
          protocol: TCP
      resources:
        limits:
          cpu: "2"
          memory: "10Gi"
        requests:
          cpu: "1"
          memory: "10Mi"
  initContainers: # Initialization container configuration
    - name: test-mysql
      image: busybox:1.30
      command: ["sh","-c","until ping 192.168.18.103 -c 1;do echo waiting for mysql ...;sleep 2;done;"]
      securityContext:
        privileged: true # Run the container in privileged mode
    - name: test-redis
      image: busybox:1.30
      command: ["sh","-c","until ping 192.168.18.104 -c 1;do echo waiting for redis ...;sleep 2;done;"]

After executing the command, if test-mysql is not created successfully, the subsequent containers cannot be created either. After modifying the IP to an accessible IP and re-executing the command, they will be created successfully in order.

Hook Functions#

Kubernetes provides two hook functions after the main container starts and before it stops:

  • postStart: Executes after the container is created; if it fails, the container will be restarted.

  • preStop: Executes before the container terminates; after execution, the container will successfully terminate, and the deletion operation will be blocked until it completes.

Hook handlers support defining actions using the following three methods:

  • exec command: Execute a command once inside the container.
  .......
    lifecycle:
       postStart: 
          exec:
             command:
               - cat
               - /tmp/healthy
  .......
  • tcpSocket: Attempts to access the specified socket in the current container.
  .......
     lifecycle:
        postStart:
           tcpSocket:
              port: 8080
  .......
  • httpGet: Initiates an HTTP request to a certain URL in the current container.
  ....... 
     lifecycle:
        postStart:
           httpGet:
              path: / # URI address
              port: 80 # Port number
              host: 192.168.109.100 # Host address  
              scheme: HTTP # Supported protocol, http or https
  .......

Container Probes#

Container probes are used to check whether the application instances in the container are functioning properly, which is a traditional mechanism to ensure business availability. If the probe indicates that the instance's state does not meet expectations, Kubernetes will "remove" the problematic instance and not handle business traffic. Kubernetes provides two types of probes to implement container probing:

  • liveness probes: Used to check whether the application instance is currently running normally; if not, k8s will restart the container.

  • readiness probes: Used to check whether the application instance can accept requests; if not, k8s will not forward traffic.

livenessProbe: Determines whether to restart the container.
readinessProbe: Determines whether to forward requests to the container.

k8s introduced the startupProbe probe after version 1.16, used to determine whether the application in the container has started. If the startupProbe probe is configured, other probes will be disabled until the startupProbe probe succeeds; once successful, probing will no longer occur.

Both of the above probes currently support three probing methods:

  • exec command: Execute a command once inside the container; if the exit code of the command is 0, the program is considered normal; otherwise, it is not.
  ……
    livenessProbe:
       exec:
          command:
            -    cat
            -    /tmp/healthy
  ……
  • tcpSocket: Attempts to access a port of the user container; if a connection can be established, the program is considered normal; otherwise, it is not.
  ……
     livenessProbe:
        tcpSocket:
           port: 8080
  ……
  • httpGet: Calls the URL of the web application in the container; if the returned status code is between 200 and 399, the program is considered normal; otherwise, it is not.
……
   livenessProbe:
      httpGet:
         path: / # URI address
         port: 80 # Port number
         host: 127.0.0.1 # Host address
         scheme: HTTP # Supported protocol, http or https
……

Restart Policy#

In container probing, once a container probe encounters a problem, Kubernetes will restart the Pod containing the container; this is actually determined by the Pod's restart policy. The Pod's restart policy has three types, as follows:

  • Always: Automatically restarts the container when it fails; this is the default value.
  • OnFailure: Restarts the container when it terminates and the exit code is not 0.
  • Never: Does not restart the container regardless of its state.

The restart policy applies to all containers in the Pod object. The first container that needs to be restarted will be restarted immediately when needed, and subsequent restart operations will be delayed by kubelet for a period of time, with the delay duration for repeated restart operations being 10s, 20s, 40s, 80s, 160s, and 300s, with 300s being the maximum delay duration.

Pod Scheduling#

By default, which Node a Pod runs on is determined by the Scheduler component using corresponding algorithms, and this process is not manually controlled. However, in practical use, this does not meet the needs, as in many cases, we want to control certain Pods to reach certain nodes. So how should we do this? This requires understanding Kubernetes' scheduling rules for Pods. Kubernetes provides four major types of scheduling methods.

  • Automatic scheduling: The Node on which it runs is completely determined by the Scheduler through a series of algorithm calculations.
  • Directed scheduling: NodeName, NodeSelector.
  • Affinity scheduling: NodeAffinity, PodAffinity, PodAntiAffinity.
  • Taints (Toleration) scheduling: Taints, Toleration.

Directed Scheduling#

Directed scheduling refers to using the nodeName or nodeSelector declared on the Pod to schedule the Pod to the desired Node. Note that this scheduling is mandatory, meaning that even if the target Node to be scheduled does not exist, it will still attempt to schedule it, but the Pod will fail to run.

nodeName#

nodeName is used to force the Pod to be scheduled on a Node with the specified name. This method actually skips the scheduling logic of the Scheduler and directly schedules the Pod to the specified named node.
Create a file named pod-nodename.yaml with the following content:

apiVersion: v1
kind: Pod
metadata:
  name: pod-nodename
  namespace: dev
  labels:
    user: xudaxian
spec:
  containers: # Container configuration
    - name: nginx
      image: nginx:1.17.1
      imagePullPolicy: IfNotPresent
      ports:
        - name: nginx-port
          containerPort: 80
          protocol: TCP
  nodeName: k8s-node1 # Specify scheduling to the k8s-node1 node
nodeSelector#

nodeSelector is used to schedule the Pod to Node nodes that have specific labels added. It is implemented through Kubernetes' label-selector mechanism. In other words, before the Pod is created, the Scheduler will use the MatchNodeSelector scheduling strategy to perform label matching, find the target node, and then schedule the Pod to the target node. This matching rule is mandatory.

apiVersion: v1
kind: Pod
metadata:
  name: pod-nodeselector
  namespace: dev
spec:
  containers: # Container configuration
    - name: nginx
      image: nginx:1.17.1
      imagePullPolicy: IfNotPresent
      ports:
        - name: nginx-port
          containerPort: 80
          protocol: TCP
  nodeSelector:
    nodeenv: pro # Specify scheduling to Node nodes with nodeenv=pro

Affinity Scheduling#

Although the two methods of directed scheduling are very convenient to use, they also have certain issues, namely that if there are no Nodes that meet the conditions, the Pod will not run, even if there are available Nodes in the cluster. This limits its use cases.
To address the above issues, Kubernetes also provides an affinity scheduling (Affinity). It extends the nodeSelector and can be configured to prioritize selecting Nodes that meet the conditions for scheduling; if not, it can also schedule to Nodes that do not meet the conditions, making scheduling more flexible. Affinity is mainly divided into three categories:

  • nodeAffinity (Node Affinity): Targets Nodes and solves the problem of which Nodes the Pod can be scheduled to.
  • podAffinity (Pod Affinity): Targets Pods and solves the problem of which existing Pods can be deployed in the same topology domain.
  • podAntiAffinity (Pod Anti-Affinity): Targets Pods and solves the problem of which existing Pods cannot be deployed in the same topology domain.

Explanation of the usage scenarios for affinity and anti-affinity:

  • Affinity: If two applications frequently interact, it is necessary to use affinity to keep the two applications as close as possible, thus reducing performance loss due to network communication.
  • Anti-affinity: When applications are deployed in multiple replicas, it is necessary to use anti-affinity to scatter the instances of each application across different Nodes, thus improving service availability.
nodeAffinity (Node Affinity)#

View the optional configuration items for nodeAffinity:

pod.spec.affinity.nodeAffinity
  requiredDuringSchedulingIgnoredDuringExecution  Node nodes must meet all specified rules to be scheduled, equivalent to a hard limit.
    nodeSelectorTerms  Node selection list
      matchFields   Node selector requirements listed by node fields.  
      matchExpressions   Node selector requirements listed by node labels (recommended).
        key    Key
        values Value
        operator Relationship operator supports Exists, DoesNotExist, In, NotIn, Gt, Lt.
  preferredDuringSchedulingIgnoredDuringExecution Prefer to schedule to Nodes that meet specified rules, equivalent to a soft limit (preference).
    preference   A node selector item associated with a corresponding weight.
      matchFields Node selector requirements listed by node fields.
      matchExpressions Node selector requirements listed by node labels (recommended).
        key Key
        values Value
        operator Relationship operator supports In, NotIn, Exists, DoesNotExist, Gt, Lt.
    weight Preference weight, in the range of 1-100.

Explanation of the use of relationship operators:

- matchExpressions:
    - key: nodeenv # Match nodes with the key of nodeenv that have the label.
      operator: Exists   
    - key: nodeenv # Match nodes with the key of nodeenv, and the value is "xxx" or "yyy".
      operator: In    
      values: ["xxx","yyy"]
    - key: nodeenv # Match nodes with the key of nodeenv, and the value is greater than "xxx".
      operator: Gt   
      values: "xxx"

Demonstration of requiredDuringSchedulingIgnoredDuringExecution:
Create a file named pod-nodeaffinity-required.yaml with the following content:

apiVersion: v1
kind: Pod
metadata:
  name: pod-nodeaffinity-required
  namespace: dev
spec:
  containers: # Container configuration
    - name: nginx
      image: nginx:1.17.1
      imagePullPolicy: IfNotPresent
      ports:
        - name: nginx-port
          containerPort: 80
          protocol: TCP
  affinity: # Affinity configuration
    nodeAffinity: # Node affinity configuration
      requiredDuringSchedulingIgnoredDuringExecution: # Node nodes must meet all specified rules to be scheduled, equivalent to a hard rule, similar to directed scheduling.
        nodeSelectorTerms: # Node selection list
          - matchExpressions:
              - key: nodeenv # Match nodes with the key of nodeenv that have the label, and the value is "xxx" or "yyy".
                operator: In
                values:
                  - "xxx"
                  - "yyy"

Notes on nodeAffinity:

  • If both nodeSelector and nodeAffinity are defined, both conditions must be met for the Pod to run on the specified Node.
  • If nodeAffinity specifies multiple nodeSelectorTerms, only one needs to match successfully.
  • If there are multiple matchExpressions in a nodeSelectorTerms, a node must meet all of them to match successfully.
  • If the labels of the Node where a Pod is located change during the Pod's operation and no longer meet the Pod's nodeAffinity requirements, the system will ignore this change.
podAffinity (Pod Affinity)#

podAffinity mainly implements the function of allowing newly created Pods to be deployed in the same area as the reference Pods.
Optional configuration items for PodAffinity:

pod.spec.affinity.podAffinity
  requiredDuringSchedulingIgnoredDuringExecution  Hard limit.
    namespaces Specifies the namespace of the reference Pod.
    topologyKey Specifies the scheduling scope.
    labelSelector Label selector.
      matchExpressions  Node selector requirements listed by node labels (recommended).
        key    Key
        values Value
        operator Relationship operator supports In, NotIn, Exists, DoesNotExist.
      matchLabels    Content mapped by multiple matchExpressions.  
  preferredDuringSchedulingIgnoredDuringExecution Soft limit.    
    podAffinityTerm  Options.
      namespaces
      topologyKey
      labelSelector
         matchExpressions 
            key    Key  
            values Value  
            operator
         matchLabels 
    weight Preference weight, in the range of 1-1.

topologyKey is used to specify the scope of scheduling, for example:

  • If specified as kubernetes.io/hostname, it distinguishes based on Node nodes.
  • If specified as beta.kubernetes.io/os, it distinguishes based on the operating system type of Node nodes.

Demonstration of requiredDuringSchedulingIgnoredDuringExecution.
Create a file named pod-podaffinity-required.yaml with the following content:

apiVersion: v1
kind: Pod
metadata:
  name: pod-podaffinity-required
  namespace: dev
spec:
  containers: # Container configuration
    - name: nginx
      image: nginx:1.17.1
      imagePullPolicy: IfNotPresent
      ports:
        - name: nginx-port
          containerPort: 80
          protocol: TCP
  affinity: # Affinity configuration
    podAffinity: # Pod affinity
      requiredDuringSchedulingIgnoredDuringExecution: # Hard limit
        - labelSelector:
            matchExpressions: # This Pod must be on the same Node as Pods with the label podenv=xxx or podenv=yyy; obviously, there are no such Pods.
              - key: podenv
                operator: In
                values:
                  - "xxx"
                  - "yyy"
          topologyKey: kubernetes.io/hostname
podAntiAffinity (Pod Anti-Affinity)#

podAntiAffinity mainly implements the function of preventing newly created Pods from being deployed in the same area as the reference Pods.
Its configuration method is the same as podAffinity.

apiVersion: v1
kind: Pod
metadata:
  name: pod-podantiaffinity-required
  namespace: dev
spec:
  containers: # Container configuration
    - name: nginx
      image: nginx:1.17.1
      imagePullPolicy: IfNotPresent
      ports:
        - name: nginx-port
          containerPort: 80
          protocol: TCP
  affinity: # Affinity configuration
    podAntiAffinity: # Pod anti-affinity
      requiredDuringSchedulingIgnoredDuringExecution: # Hard limit
        - labelSelector:
            matchExpressions:
              - key: podenv
                operator: In
                values:
                  - "pro"
          topologyKey: kubernetes.io/hostname

Taints and Tolerations#

Taints#

The previous scheduling methods are all based on the perspective of the Pod, determining whether to schedule the Pod to a specified Node by adding attributes to the Pod. In fact, we can also approach it from the perspective of the Node, deciding whether to run Pods scheduled to it by adding taint attributes to the Node.
Once a Node is tainted, it creates an exclusion relationship with Pods, thereby rejecting Pods from being scheduled in, and can even evict existing Pods.
The format of a taint is: key=value, where key and value are the taint labels, and effect describes the effect of the taint, supporting the following three options:

  • PreferNoSchedule: Kubernetes will try to avoid scheduling Pods to Nodes with this taint unless no other nodes are available.
  • NoSchedule: Kubernetes will not schedule Pods to Nodes with this taint, but it will not affect Pods that already exist on the current Node.
  • NoExecute: Kubernetes will not schedule Pods to Nodes with this taint, and will also evict existing Pods on the Node.

2

Tolerations#

The above introduced the function of taints; we can add taints to Nodes to refuse Pods from being scheduled. However, if we want a Pod to be scheduled to a Node with taints, what should we do? This is where tolerations come into play.

Taints are refusals, and tolerations are ignores. Nodes refuse Pods through taints, and Pods ignore refusals through tolerations.

The detailed configuration of tolerations:

kubectl explain pod.spec.tolerations
......
FIELDS:
  key       # Corresponds to the key of the taint to tolerate; empty means matching all keys.
  value     # Corresponds to the value of the taint to tolerate.
  operator  # The operator for the key-value pair, supports Equal and Exists (default).
  effect    # Corresponds to the effect of the taint; empty means matching all effects.
  tolerationSeconds   # Toleration time, effective when the effect is NoExecute, indicating the Pod's stay time on the Node.

When the operator is Equal, if the Node has multiple Taints, each Taint must be tolerated for the Pod to be deployed.
When the operator is Exists, there are three writing methods:

  • Tolerate the specified taint, the taint has the specified effect:
  • Tolerate the specified taint, regardless of the specific effect:
  • Tolerate all taints (use with caution):
  tolerations: # Tolerations
    - key: "tag" # The key of the taint to tolerate.
      operator: Exists # Operator.
      effect: NoExecute # Add toleration rules; this must match the marked taint rules.
  tolerations: # Tolerations
    - key: "tag" # The key of the taint to tolerate.
      operator: Exists # Operator.
 tolerations: # Tolerations
    - operator: Exists # Operator.

Pod Controllers#

In Kubernetes, Pods can be divided into two categories based on how they are created:

  • Standalone Pods: Pods created directly by Kubernetes; these Pods are gone once deleted and will not be rebuilt.
  • Controller-created Pods: Pods created by Pod controllers; these Pods will be automatically rebuilt after deletion.

Pod controllers are the intermediate layer managing Pods. Once we use a Pod controller, we only need to tell the Pod controller how many Pods of what type we want, and it will create Pods that meet the conditions and ensure each Pod is in the user-expected state. If a Pod fails while running, the controller will restart or rebuild the Pod based on the specified policy.
There are many types of Pod controllers in Kubernetes, each suitable for its own scenarios. The common ones include the following:

  • ReplicationController: A relatively primitive Pod controller that has been deprecated and replaced by ReplicaSet.
  • ReplicaSet: Ensures a specified number of Pods are running and supports changes in the number of Pods and image versions.
  • Deployment: Controls Pods through ReplicaSet and supports rolling upgrades and version rollbacks.
  • Horizontal Pod Autoscaler: Can automatically adjust the number of Pods based on cluster load, achieving peak shaving and filling.
  • DaemonSet: Runs a replica on specified Nodes in the cluster, generally used for daemon-like tasks.
  • Job: The Pods it creates exit immediately after completing the task, used for executing one-time tasks.
  • CronJob: The Pods it creates execute periodically, used for executing periodic tasks.
  • StatefulSet: Manages stateful applications.

ReplicaSet (RS)#

The main function of ReplicaSet is to ensure that a certain number of Pods can run normally. It continuously monitors the running status of these Pods, and once a Pod fails, it will restart or rebuild it. It also supports scaling the number of Pods.

3

The resource manifest file for ReplicaSet:

apiVersion: apps/v1 # Version number 
kind: ReplicaSet # Type 
metadata: # Metadata 
  name: # rs name
  namespace: # Namespace 
  labels: # Labels 
    controller: rs 
spec: # Detailed description 
  replicas: 3 # Number of replicas 
  selector: # Selector, specifies which Pods this controller manages.
    matchLabels: # Labels matching rules 
      app: nginx-pod 
    matchExpressions: # Expressions matching rules 
      - {key: app, operator: In, values: [nginx-pod]} 
template: # Template, when the number of replicas is insufficient, Pods will be created based on the template below.
  metadata: 
    labels: 
      app: nginx-pod 
  spec: 
    containers: 
      - name: nginx 
        image: nginx:1.17.1 
        ports: 
        - containerPort: 80

Here, the new configuration items to understand are several options under spec:

  • replicas: Specifies the number of replicas, which is actually the number of Pods created by the current RS, defaulting to 1.
  • selector: The selector establishes the association between the Pod controller and the Pods, using the Label Selector mechanism (defining Labels on the Pod module and defining selectors on the controller indicates which Pods the current controller can manage).
  • template: The template is the template used by the current controller to create Pods; it is essentially the Pod definition we learned earlier.

Deployment (Deploy)#

To better solve the problem of service orchestration, Kubernetes introduced the Deployment controller starting from version 1.2. It is worth mentioning that the Deployment controller does not directly manage Pods but indirectly manages Pods by managing ReplicaSets, meaning that the functionality of Deployment is more powerful than that of ReplicaSet.

The main functions of Deployment are as follows:

  • Supports all functions of ReplicaSet.
  • Supports stopping and continuing deployments.
  • Supports rolling updates and version rollbacks.

The resource manifest for Deployment:

apiVersion: apps/v1 # Version number 
kind: Deployment # Type 
metadata: # Metadata 
  name: # rs name 
  namespace: # Namespace 
  labels: # Labels 
    controller: deploy 
spec: # Detailed description 
  replicas: 3 # Number of replicas 
  revisionHistoryLimit: 3 # Retain historical versions, default is 10 
  paused: false # Pause deployment, default is false 
  progressDeadlineSeconds: 600 # Deployment timeout (s), default is 600 
  strategy: # Strategy 
    type: RollingUpdate # Rolling update strategy 
    rollingUpdate: # Rolling update 
      maxSurge: 30% # Maximum additional replicas that can exist, can be a percentage or an integer.
      maxUnavailable: 30% # Maximum number of Pods that can be unavailable during the update, can be a percentage or an integer.
  selector: # Selector, specifies which Pods this controller manages.
    matchLabels: # Labels matching rules 
      app: nginx-pod 
    matchExpressions: # Expressions matching rules 
      - {key: app, operator: In, values: [nginx-pod]} 
  template: # Template, when the number of replicas is insufficient, Pods will be created based on the template below.
    metadata: 
      labels: 
        app: nginx-pod 
    spec: 
      containers: 
      - name: nginx 
        image: nginx:1.17.1 
        ports: 
        - containerPort: 80

Deployment supports two image update strategies: Recreate and RollingUpdate (default), which can be configured through the strategy option.

strategy: Specifies the strategy for replacing old Pods with new Pods, supporting two attributes.
  type: Specifies the strategy type, supporting two strategies.
    Recreate: All existing Pods will be killed before creating new Pods.
    RollingUpdate: Rolling update, which kills some and starts some, during the update process, both versions of Pods exist.
  rollingUpdate: Effective when type is RollingUpdate, used to set parameters for rollingUpdate, supporting two attributes:
    maxUnavailable: Used to specify the maximum number of Pods that can be unavailable during the upgrade process, default is 25%.
    maxSurge: Used to specify the maximum number of Pods that can exceed the expected number during the upgrade process, default is 25%.

Deployment supports functions such as pausing and continuing the version upgrade process and rolling back to previous versions. Let's take a closer look:

# Version upgrade related functions
kubectl rollout parameters deploy xx  # Supports the following selections
# status Displays the current upgrade status.
# history Displays the upgrade history.
# pause Pauses the version upgrade process.
# resume Continues the already paused version upgrade process.
# restart Restarts the version upgrade process.
# undo Rolls back to the previous version (you can use --to-revision to roll back to a specified version).

The reason Deployment can achieve version rollback is that it records the historical ReplicaSets. Once you want to roll back to that version, you only need to reduce the number of Pods of the current version to 0 and increase the number of Pods of the rollback version to the target number.

Canary Release#

Deployment supports control during the update process, such as pausing the update operation (pause) or continuing the update operation (resume).
For example, after a batch of new Pod resources are created, immediately pause the update process. At this point, only a portion of the new version of the application exists, while the majority is still the old version. Then, route a small portion of user requests to the new version of the Pod application, continue to observe whether it can run stably as expected; if there are no issues, continue to complete the rolling update of the remaining Pod resources; otherwise, immediately roll back.

Horizontal Pod Autoscaler (HPA)#

We can manually execute the kubectl scale command to achieve scaling of Pods, but this clearly does not align with Kubernetes' goal of automation and intelligence. Kubernetes expects to automatically adjust the number of Pods based on monitoring the usage of Pods, which led to the creation of the HPA controller.
HPA can obtain the utilization of each Pod, compare it with the metrics defined in HPA, calculate the specific value needed for scaling, and finally adjust the number of Pods. In fact, HPA, like the previous Deployment, is also a type of Kubernetes resource object that determines whether to adjust the target number of Pod replicas based on tracking and analyzing the load changes of the target Pods.

5

If there is no program in the cluster to collect resource usage, you can choose to install metrics-server.

Test example:

apiVersion: autoscaling/v1 # Version number
kind: HorizontalPodAutoscaler # Type
metadata: # Metadata
  name: pc-hpa # Name of the deployment
  namespace: dev # Namespace
spec:
  minReplicas: 1 # Minimum number of Pods
  maxReplicas: 10 # Maximum number of Pods
  targetCPUUtilizationPercentage: 3 # CPU utilization metric
  scaleTargetRef:  # Specify the information of the Nginx to control
    apiVersion: apps/v1
    kind: Deployment
    name: nginx

DaemonSet (DS)#

The DaemonSet type of controller ensures that a replica runs on every (or specified) node in the cluster, generally suitable for log collection, node monitoring, and other scenarios. In other words, if a Pod provides functionality at the node level (needed and only needed once per node), then this type of Pod is suitable for being created using a DaemonSet type controller.

Characteristics of the DaemonSet controller:

  • Every time a node is added to the cluster, the specified Pod replica will also be added to that node.
  • When a node is removed from the cluster, the Pod will also be garbage collected.

The resource manifest for DaemonSet:

apiVersion: apps/v1 # Version number
kind: DaemonSet # Type
metadata: # Metadata
  name: # Name
  namespace: # Namespace
  labels: # Labels
    controller: daemonset
spec: # Detailed description
  revisionHistoryLimit: 3 # Retain historical versions
  updateStrategy: # Update strategy
    type: RollingUpdate # Rolling update strategy
    rollingUpdate: # Rolling update
      maxUnavailable: 1 # Maximum number of Pods that can be unavailable, can be a percentage or an integer.
  selector: # Selector, specifies which Pods this controller manages.
    matchLabels: # Labels matching rules
      app: nginx-pod
    matchExpressions: # Expressions matching rules
      - key: app
        operator: In
        values:
          - nginx-pod
  template: # Template, when the number of replicas is insufficient, Pods will be created based on the template below.
     metadata:
       labels:
         app: nginx-pod
     spec:
       containers:
         - name: nginx
           image: nginx:1.17.1
           ports:
             - containerPort: 80

Job#

Job is mainly responsible for batch processing short-lived one-time tasks.
Characteristics of Job:

  • When the Pods created by Job successfully finish executing, Job will record the number of successfully finished Pods.
  • When the number of successfully finished Pods reaches the specified number, Job will complete execution.

Job can ensure that the specified number of Pods complete execution.

The resource manifest for Job:

apiVersion: batch/v1 # Version number
kind: Job # Type
metadata: # Metadata
  name:  # Name
  namespace:  # Namespace
  labels: # Labels
    controller: job
spec: # Detailed description
  completions: 1 # Specify the total number of successful Pod runs required by Job, default is 1
  parallelism: 1 # Specify the number of Pods that should run concurrently at any given time, default is 1
  activeDeadlineSeconds: 30 # Specify the time limit for Job to run; if it exceeds this time and has not finished, the system will attempt to terminate it.
  backoffLimit: 6 # Specify the number of retries after Job fails, default is 6
  manualSelector: true # Whether to use selector to select Pods, default is false
  selector: # Selector, specifies which Pods this controller manages.
    matchLabels: # Labels matching rules
      app: counter-pod
    matchExpressions: # Expressions matching rules
      - key: app
        operator: In
        values:
          - counter-pod
  template: # Template, when the number of replicas is insufficient, Pods will be created based on the template below.
     metadata:
       labels:
         app: counter-pod
     spec:
       restartPolicy: Never # Restart policy can only be set to Never or OnFailure
       containers:
         - name: counter
           image: busybox:1.30
           command: ["/bin/sh","-c","for i in 9 8 7 6 5 4 3 2 1;do echo $i;sleep 20;done"]

Explanation of the restart policy in the template:

  • If set to OnFailure, Job will restart the container when the Pod fails, rather than creating a new Pod, and the failed count remains unchanged.
  • If set to Never, Job will create a new Pod when the Pod fails, and the failed Pod will not disappear or restart, and the failed count will increase by 1.
  • If specified as Always, it means it will keep restarting, which means the Pod task will be executed repeatedly, conflicting with the definition of Job, so it cannot be set to Always.

CronJob (CJ)#

The CronJob controller manages Job controller resources and uses it to manage Pod resource objects. The Job controller defines the job task to be executed immediately after its controller resource is created, but CronJob can control its running time point and repeating method in a manner similar to periodic task job scheduling in Linux operating systems. In other words, CronJob can run job tasks at specific time points (repeatedly).

The resource manifest file for CronJob:

apiVersion: batch/v1beta1 # Version number
kind: CronJob # Type       
metadata: # Metadata
  name: # rs name 
  namespace: # Namespace 
  labels: # Labels
    controller: cronjob
spec: # Detailed description
  schedule: # Cron format job scheduling running time point, used to control when the task is executed.
  concurrencyPolicy: # Concurrency execution policy, used to define whether and how to run the next job when the previous job is still running.
  failedJobHistoryLimit: # The number of historical records to retain for failed job executions, default is 1.
  successfulJobHistoryLimit: # The number of historical records to retain for successful job executions, default is 3.
  startingDeadlineSeconds: # Timeout duration for starting job errors.
  jobTemplate: # Job controller template, used to generate job objects for the cronjob controller; below is actually the definition of the job.
    metadata:
    spec:
      completions: 1
      parallelism: 1
      activeDeadlineSeconds: 30
      backoffLimit: 6
      manualSelector: true
      selector:
        matchLabels:
          app: counter-pod
        matchExpressions: Rules
          - {key: app, operator: In, values: [counter-pod]}
      template:
        metadata:
          labels:
            app: counter-pod
        spec:
          restartPolicy: Never 
          containers:
          - name: counter
            image: busybox:1.30
            command: ["bin/sh","-c","for i in 9 8 7 6 5 4 3 2 1; do echo $i;sleep 20;done"]

Several options that need to be explained in detail:
schedule: Cron expression used to specify the execution time of the task.
*/1 * * * *

Minute values range from 0 to 59.
Hour values range from 0 to 23.
Day values range from 1 to 31.
Month values range from 1 to 12.
Week values range from 0 to 6, where 0 represents Sunday.
Multiple times can be separated by commas; ranges can be given with hyphens; * can be used as a wildcard; / indicates every...

concurrencyPolicy:
Allow: Allows Jobs to run concurrently (default).
Forbid: Prohibits concurrent running; if the previous run has not completed, the next run will be skipped.
Replace: Cancels the currently running job and replaces it with a new job.


#### StatefulSet (Stateful)

Stateless applications:

- Consider all Pods the same.
- No order requirements.
- Do not need to consider which Node to run on.
- Can scale and expand freely.

Stateful applications:

- Have order requirements.
- Consider each Pod to be unique.
- Need to consider which Node to run on.
- Need to scale and expand in order.
- Ensure that each Pod is independent, maintaining the startup order and uniqueness of Pods.

StatefulSet is a load management controller provided by Kubernetes for managing stateful applications.
StatefulSet deployment requires a Headless Service.

> Why is a Headless Service needed?
> 
> - When using Deployment, each Pod name is unordered and is a random string, so the Pod name is unordered. However, in StatefulSet, it is required to be ordered, and each Pod cannot be arbitrarily replaced; after Pod reconstruction, the Pod name remains the same.
> - Since Pod IPs are variable, they are identified by Pod names. Pod names are unique identifiers for Pods and must be persistently stable and valid. This is where a Headless Service comes into play, which can give each Pod a unique name.
> 
> StatefulSet is commonly used to deploy RabbitMQ clusters, Zookeeper clusters, MySQL clusters, Eureka clusters, etc.

Demonstration example:

```yaml
apiVersion: v1
kind: Service
metadata:
  name: service-headlessness
  namespace: dev
spec:
  selector:
    app: nginx-pod
  clusterIP: None # Set clusterIP to None to create a headless Service.
  type: ClusterIP
  ports:
    - port: 80 # Port of the Service.
      targetPort: 80 # Port of the Pod.
...

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: pc-statefulset
  namespace: dev
spec:
  replicas: 3
  serviceName: service-headlessness
  selector:
    matchLabels:
      app: nginx-pod
  template:
    metadata:
      labels:
        app: nginx-pod
    spec:
      containers:
        - name: nginx
          image: nginx:1.17.1
          ports:
            - containerPort: 80

Service#

In Kubernetes, Pods are the carriers of applications, and we can access applications through the IP of the Pod. However, the IP address of the Pod is not fixed, which means it is inconvenient to directly use the Pod's IP to access services.
To solve this problem, Kubernetes provides the Service resource, which aggregates multiple Pods providing the same service and provides a unified entry address. By accessing the entry address of the Service, you can access the underlying Pod services.

In many cases, Service is just a concept; the one that actually works is the kube-proxy service process, which runs a kube-proxy service process on each Node. When a Service is created, it writes the created Service information to etcd through the API Server, and kube-proxy discovers changes in this Service based on a listening mechanism, then converts the latest Service information into corresponding access rules.

9

Kube-proxy currently supports three working modes:

  • userspace mode:

    • In userspace mode, kube-proxy creates a listening port for each Service. Requests sent to the Cluster IP are redirected to the port listened to by kube-proxy via iptables rules. Kube-proxy selects a Pod providing the service based on the LB algorithm (load balancing algorithm) and establishes a connection to forward the request to the Pod.

    • In this mode, kube-proxy acts as a layer 4 load balancer. Since kube-proxy runs in userspace, data copying between the kernel and user space increases during forwarding processing, making it stable but very inefficient.

10

  • iptables mode:

    • In iptables mode, kube-proxy creates corresponding iptables rules for each Pod in the Service backend, directly redirecting requests sent to the Cluster IP to the IP of a Pod.

    • In this mode, kube-proxy does not act as a layer 4 load balancer; it only creates iptables rules. The advantage of this mode is that it is more efficient than userspace mode, but it cannot provide flexible LB strategies and cannot retry when backend Pods become unavailable.

11

  • ipvs mode:

  • The ipvs mode is similar to iptables; kube-proxy monitors changes in Pods and creates corresponding ipvs rules. IPVS is more efficient in forwarding than iptables and supports more LB (load balancing) algorithms.

12

Service Types#

The resource manifest for Service:

apiVersion: v1 # Version
kind: Service # Type
metadata: # Metadata
  name: # Resource name
  namespace: # Namespace
spec:
  selector: # Label selector, used to determine which Pods the current Service proxies.
    app: nginx
  type: NodePort # Type of Service, specifies the access method for the Service.
  clusterIP: # Virtual service IP address.
  sessionAffinity: # Session affinity, supports ClientIP and None options, default is None.
  ports: # Port information
    - port: 8080 # Service port.
      protocol: TCP # Protocol.
      targetPort: # Pod port.
      nodePort: # Host port.

Explanation of spec.type:

  • ClusterIP: Default value, it is a virtual IP automatically assigned by the Kubernetes system, accessible only within the cluster.
  • NodePort: Exposes the Service through a specified port on the Node, allowing access to the service from outside the cluster.
  • LoadBalancer: Uses an external load balancer to distribute the load to the service; note that this mode requires support from external cloud environments.
  • ExternalName: Introduces external services into the cluster, allowing direct access to the external service using this Service.

ClusterIP Type Service#

Endpoints (not commonly used in practice)

  • Endpoints are a resource object in Kubernetes, stored in etcd, used to record the access addresses of all Pods corresponding to a service. It is generated based on the selector described in the service configuration file.
  • A service consists of a set of Pods, and these Pods are exposed through Endpoints. In other words, the connection between the service and Pods is realized through Endpoints.

13

Load Balancing Strategy

Access to the Service is distributed to the backend Pods. Currently, Kubernetes provides two load balancing strategies:

  • If not defined, the default uses the strategy of kube-proxy, such as random, polling, etc.
  • Session persistence mode based on client addresses, meaning that all requests initiated from the same client will be forwarded to a fixed Pod. This is friendly for traditional session-based authentication projects, and this mode can be added in spec with the sessionAffinity: ClusterIP option.

Headless Type Service#

In certain scenarios, developers may not want to use the load balancing functionality provided by Service but prefer to control the load balancing strategy themselves. For this situation, Kubernetes provides Headless Service, which does not allocate Cluster IP. If you want to access the Service, you can only query it through the Service's domain name.

NodePort Type Service#

In the previous example, the IP address of the created Service can only be accessed within the cluster. If you want to expose the Service for use outside the cluster, you need to use another type of Service called NodePort. The working principle of NodePort is to map the Service's port to a port on the Node, allowing access to the Service via NodeIP.

14

LoadBalancer Type Service#

15

ExternalName Type Service#

The ExternalName type Service is used to introduce external services into the cluster. It specifies the address of a service through the externalName attribute, allowing access to the external service from within the cluster using this Service.

16

Ingress#

As we already know, the main ways to expose services outside the cluster are NodePort and LoadBalancer. However, both of these methods have certain drawbacks:

  • The drawback of NodePort is that it occupies many ports on the cluster machines, which becomes increasingly evident as the number of services in the cluster increases.
  • The drawback of LoadBalancer is that each Service requires a LB, which is wasteful, troublesome, and requires support from devices outside Kubernetes.

Based on this situation, Kubernetes provides the Ingress resource object, which can meet the need to expose multiple Services with just one NodePort or one LB. The working mechanism is roughly illustrated in the following diagram:

17

In fact, Ingress is equivalent to a layer 7 load balancer, which is an abstraction of reverse proxy in Kubernetes. Its working principle is similar to Nginx; you can understand that Ingress establishes many mapping rules, and the Ingress Controller listens for changes in these configuration rules and converts them into Nginx reverse proxy configurations, then provides services externally.

  • Ingress: An object in Kubernetes that defines the rules for how requests are forwarded to Services.
  • Ingress Controller: The program that implements reverse proxy and load balancing, parses the rules defined by Ingress, and implements request forwarding based on the configured rules. There are many implementations, such as Nginx, Contour, Haproxy, etc.

The working principle of Ingress (with Nginx) is as follows:

  • Users write Ingress rules, indicating which domain name corresponds to which Service in the Kubernetes cluster.
  • The Ingress controller dynamically perceives changes in Ingress service rules and generates a corresponding Nginx reverse proxy configuration.
  • The Ingress controller writes the generated Nginx configuration into a running Nginx service and updates it dynamically.
  • At this point, what is really working is an Nginx that has configured the user-defined request rules.

18

Ingress supports HTTP and HTTPS proxy.

Data Storage#

As mentioned earlier, the lifecycle of containers can be very short, and they may be frequently created and destroyed. When a container is destroyed, the data stored in the container will also be cleared. This outcome is undesirable for users in certain situations. To persistently save container data, Kubernetes introduces the concept of Volume.

A Volume is a shared directory that can be accessed by multiple containers in a Pod. It is defined on the Pod and then mounted to specific file directories by multiple containers within a Pod. Kubernetes uses Volumes to achieve data sharing between different containers in the same Pod and to provide persistent storage. The lifecycle of a Volume is not tied to the lifecycle of individual containers in the Pod; when a container terminates or restarts, the data in the Volume will not be lost.

Kubernetes' Volumes support various types, with the following being the most common:

  • Basic storage: EmptyDir, HostPath, NFS
  • Advanced storage: PV, PVC
  • Configuration storage: ConfigMap, Secret

Basic Storage#

EmptyDir#

EmptyDir is the most basic type of Volume; an EmptyDir is an empty directory on the Host.

EmptyDir is created when the Pod is assigned to a Node, its initial content is empty, and there is no need to specify a corresponding directory file on the host, as Kubernetes will automatically allocate a directory. When the Pod is destroyed, the data in the EmptyDir will also be permanently deleted. The uses of EmptyDir include:

  • Temporary space, such as a temporary directory required by certain applications during runtime, which does not need to be permanently retained.
  • A directory from which one container needs to obtain data from another container (multi-container shared directory).

Next, let's use EmptyDir in a case of file sharing between containers.

In a Pod, prepare two containers, nginx and busybox, and declare a Volume to be mounted to the directories of both containers. The nginx container is responsible for writing logs to the Volume, while busybox reads the log content to the console via command.

19

HostPath#

In the last lesson, we mentioned that data in EmptyDir will not be persisted; it will be destroyed with the Pod. If you want to simply persist data to the host, you can choose HostPath.

HostPath mounts an actual directory from the Node host into the Pod for the containers to use, ensuring that even if the Pod is destroyed, the data can still exist on the Node host.

20

NFS#

While HostPath can solve the problem of data persistence, if a Node fails and the Pod is moved to another Node, issues may arise again. At this point, a separate network storage system is needed, commonly using NFS or CIFS.

NFS is a network file storage system; you can set up an NFS server and directly connect the storage in the Pod to the NFS system. This way, regardless of how the Pod moves between nodes, as long as the Node can connect to the NFS, the data can be successfully accessed.

21

Advanced Storage#

Having learned to use NFS for storage, it requires users to set up the NFS system and configure it in YAML. Since Kubernetes supports many storage systems, it is unrealistic to expect users to master all of them. To abstract the underlying storage implementation details and facilitate user use, Kubernetes introduces two resource objects: PV and PVC.

  • PV (Persistent Volume) refers to a persistent volume, which is an abstraction of the underlying shared storage. Generally, PV is created and configured by Kubernetes administrators, and it is related to the specific shared storage technology, interfacing with shared storage through plugins.
  • PVC (Persistent Volume Claim) refers to a persistent volume claim, which is a declaration of the user's storage requirements. In other words, PVC is essentially a resource request made by the user to the Kubernetes system.

22

After using PV and PVC, the work can be further subdivided:

  • Storage: Maintained by storage engineers.
  • PV: Maintained by Kubernetes administrators.
  • PVC: Maintained by Kubernetes users.

PV#

PV is an abstraction of storage resources. Below is the resource manifest file:

apiVersion: v1  
kind: PersistentVolume
metadata:
  name: pv2
spec:
  nfs: # Storage type, corresponding to the underlying actual storage.
  capacity:  # Storage capacity, currently only supports setting storage space.
    storage: 2Gi
  accessModes:  # Access modes.
  storageClassName: # Storage class.
  persistentVolumeReclaimPolicy: # Reclamation policy.

Key configuration parameters for PV:

  • Storage Type

    The type of the underlying actual storage; Kubernetes supports various storage types, and the configuration for each storage type varies.

  • Storage Capacity (capacity)

    Currently only supports setting storage space (storage=1Gi), but may include IOPS, throughput, and other metrics in the future.

  • Access Modes (accessModes)

    Describes the access permissions of user applications to storage resources. The access permissions include the following methods:

    • ReadWriteOnce (RWO): Read-write permission, but can only be mounted by a single node.
    • ReadOnlyMany (ROX): Read-only permission, can be mounted by multiple nodes.
    • ReadWriteMany (RWX): Read-write permission, can be mounted by multiple nodes.

    Note that different underlying storage types may support different access modes.

  • Reclamation Policy (persistentVolumeReclaimPolicy)

    When PV is no longer in use, the handling method for it. Currently, three strategies are supported:

    • Retain (keep): Retain data; requires manual cleanup by the administrator.
    • Recycle (reclaim): Clear data in PV, effectively executing rm -rf /thevolume/*.
    • Delete (delete): Perform the deletion operation of the volume associated with the PV in the backend storage, which is common in cloud service providers' storage services.

    Note that different underlying storage types may support different reclamation strategies.

  • Storage Class

    PV can specify a storage class through the storageClassName parameter.

    • PV with a specific class can only be bound to PVC requesting that class.
    • PV without a specified class can only be bound to PVC not requesting any class.
  • Status (status)

    A PV may be in four different stages during its lifecycle:

    • Available: Indicates available status; it has not yet been bound to any PVC.
    • Bound: Indicates that the PV has been bound to a PVC.
    • Released: Indicates that the PVC has been deleted, but the resource has not yet been reclaimed by the cluster.
    • Failed: Indicates that the automatic reclamation of the PV has failed.

PVC#

PVC is a resource request used to declare the requirements for storage space, access modes, and storage class. Below is the resource manifest file:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc
  namespace: dev
spec:
  accessModes: # Access modes.
  selector: # Use labels to select PV.
  storageClassName: # Storage class.
  resources: # Request space.
    requests:
      storage: 5Gi

Key configuration parameters for PVC:

  • Access Modes (accessModes)

    Describes the access permissions of user applications to storage resources.

  • Selection Criteria (selector)

    Through the setting of Label Selector, PVC can filter existing PVs in the system.

  • Storage Class (storageClassName)

    PVC can specify the required class of backend storage when defined; only PVs with that class can be selected by the system.

  • Resource Requests (Resources)

    Describes the request for storage resources.

Lifecycle#

PVC and PV correspond one-to-one, and the interaction between PV and PVC follows the lifecycle below:

  • Resource Supply: The administrator manually creates the underlying storage and PV.

  • Resource Binding: The user creates PVC, and Kubernetes is responsible for finding PV based on the PVC declaration and binding it.

    Once the user defines the PVC, the system will select a PV that meets the storage resource request from the existing PVs.

    • Once found, it will bind the PV to the user-defined PVC, and the user's application can use this PVC.
    • If not found, the PVC will remain in Pending status indefinitely until the system administrator creates a PV that meets its requirements.

    Once a PV is bound to a PVC, it will be exclusively occupied by that PVC and cannot be bound to other PVCs.

  • Resource Usage: Users can use PVC in Pods like a volume.

    The Pod uses the definition of Volume to mount the PVC to a certain path in the container for use.

  • Resource Release: Users delete PVC to release PV.

    When the storage resource is no longer needed, the user can delete the PVC, and the PV bound to that PVC will be marked as "released," but it cannot be immediately bound to other PVCs. The data written by the previous PVC may still remain on the storage device, and only after clearing can the PV be reused.

  • Resource Reclamation: Kubernetes reclaims resources based on the reclamation policy set for PV.

    For PV, the administrator can set the reclamation policy to determine how to handle leftover data after the PVC bound to it is released. Only after the storage space of the PV is reclaimed can it be bound and used by new PVCs.

23

Configuration Storage#

ConfigMap#

ConfigMap is a special type of storage volume primarily used to store configuration information.

Create a configmap.yaml with the following content:

apiVersion: v1
kind: ConfigMap
metadata:
  name: configmap
  namespace: dev
data:
  info: |
    username:admin
    password:123456

Next, create the ConfigMap using this configuration file.

# Create configmap
[root@k8s-master01 ~]# kubectl create -f configmap.yaml
configmap/configmap created

# View configmap details
[root@k8s-master01 ~]# kubectl describe cm configmap -n dev
Name:         configmap
Namespace:    dev
Labels:       <none>
Annotations:  <none>

Data
====
info:
....
username:admin
password:123456

Events:  <none>

Next, create a pod-configmap.yaml and mount the created ConfigMap into it.

apiVersion: v1
kind: Pod
metadata:
  name: pod-configmap
  namespace: dev
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
    volumeMounts: # Mount the configmap to the directory
    - name: config
      mountPath: /configmap/config
  volumes: # Reference the configmap
  - name: config
    configMap:
      name: configmap
# Create pod
[root@k8s-master01 ~]# kubectl create -f pod-configmap.yaml
pod/pod-configmap created

# View pod
[root@k8s-master01 ~]# kubectl get pod pod-configmap -n dev
NAME            READY   STATUS    RESTARTS   AGE
pod-configmap   1/1     Running   0          6s

# Enter the container
[root@k8s-master01 ~]# kubectl exec -it pod-configmap -n dev /bin/sh
# cd /configmap/config/
# ls
info
# more info
username:admin
password:123456

# You can see that the mapping has been successful; each configmap is mapped to a directory.
# The key represents the file, and the value represents the content of the file.
# At this point, if the content of the configmap is updated, the values in the container will also be updated dynamically.
Secret#

In Kubernetes, there is another object very similar to ConfigMap called Secret. It is mainly used to store sensitive information, such as passwords, keys, certificates, etc.

First, encode the data using base64.

[root@k8s-master01 ~]# echo -n 'admin' | base64 # Prepare username
YWRtaW4=
[root@k8s-master01 ~]# echo -n '123456' | base64 # Prepare password
MTIzNDU2

Next, write secret.yaml and create the Secret.

apiVersion: v1
kind: Secret
metadata:
  name: secret
  namespace: dev
type: Opaque
data:
  username: YWRtaW4=
  password: MTIzNDU2
# Create secret
[root@k8s-master01 ~]# kubectl create -f secret.yaml
secret/secret created

# View secret details
[root@k8s-master01 ~]# kubectl describe secret secret -n dev
Name:         secret
Namespace:    dev
Labels:       <none>
Annotations:  <none>
Type:  Opaque
Data
====
password:  6 bytes
username:  5 bytes

Create pod-secret.yaml and mount the created secret into it:

apiVersion: v1
kind: Pod
metadata:
  name: pod-secret
  namespace: dev
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
    volumeMounts: # Mount the secret to the directory
    - name: config
      mountPath: /secret/config
  volumes:
  - name: config
    secret:
      secretName: secret
# Create pod
[root@k8s-master01 ~]# kubectl create -f pod-secret.yaml
pod/pod-secret created

# View pod
[root@k8s-master01 ~]# kubectl get pod pod-secret -n dev
NAME            READY   STATUS    RESTARTS   AGE
pod-secret      1/1     Running   0          2m28s

# Enter the container and check the secret information, which has been automatically decoded.
[root@k8s-master01 ~]# kubectl exec -it pod-secret /bin/sh -n dev
/ # ls /secret/config/
password  username
/ # more /secret/config/username
admin
/ # more /secret/config/password
123456

Thus, we have achieved the use of secrets to encode information.

Security Authentication#

Kubernetes, as a management tool for distributed clusters, considers ensuring the security of the cluster as one of its important tasks. The so-called security actually means ensuring authentication and authorization operations for various clients of Kubernetes.

Clients

In a Kubernetes cluster, there are typically two types of clients:

  • User Account: Generally managed by other services outside Kubernetes.
  • Service Account: Accounts managed by Kubernetes, used to provide identity for service processes in Pods when accessing Kubernetes.

24

Authentication, Authorization, and Admission Control

ApiServer is the only entry point for accessing and managing resource objects. Any request to access ApiServer goes through the following three processes:

  • Authentication: Identity verification; only the correct account can pass authentication.
  • Authorization: Determines whether the user has permission to perform specific actions on the accessed resources.
  • Admission Control: Used to supplement the authorization mechanism to achieve more fine-grained access control functionality.

25

Authentication Management#

The key point of security in a Kubernetes cluster is how to identify and authenticate client identities. It provides three client identity authentication methods:

  • HTTP Basic Authentication: Authentication via username + password.

This authentication method encodes the "username" string using the BASE64 algorithm and sends it in the HTTP request's Header Authorization field to the server. The server receives it, decodes it to obtain the username and password, and then performs the user identity authentication process.

  • HTTP Token Authentication: Identifies legitimate users through a Token.

This authentication method uses a long, difficult-to-imitate string—Token—to indicate client identity. Each Token corresponds to a username. When the client initiates an API call request, it needs to include the Token in the HTTP Header. The API Server compares the received Token with the stored tokens on the server, then performs the user identity authentication process.

  • HTTPS Certificate Authentication: A two-way digital certificate authentication method based on CA root certificate signatures.

This authentication method is the most secure but also the most complicated to operate.

26

The HTTPS authentication process can be roughly divided into three steps:

  1. Certificate application and issuance.

    Both parties in HTTPS communication apply for certificates from CA organizations, which issue root certificates, server certificates, and private keys to the applicants.

  2. Mutual authentication between client and server.

(1) The client initiates a request to the server, which sends its certificate to the client. The client receives the certificate, decrypts it with its private key, and obtains the server's public key from the certificate. The client uses the server's public key to authenticate the information in the certificate; if consistent, it recognizes this server.
(2) The client sends its certificate to the server. The server receives the certificate, decrypts it with its private key, and obtains the client's public key from the certificate. It then authenticates the certificate information to confirm whether the client is legitimate.

  1. Communication between the server and client.

After negotiating the encryption scheme, the client generates a random secret key, encrypts it, and sends it to the server. The server receives this key, and all subsequent communications between the two parties are encrypted with this random key.

Note: Kubernetes allows multiple authentication methods to be configured simultaneously; as long as any one method passes authentication, it is sufficient.

Authorization Management#

Authorization occurs after successful authentication. Once authentication is successful, Kubernetes will determine whether the user has permission to access resources based on predefined authorization policies. This process is called authorization.

Each request sent to ApiServer carries information about the user and resources, such as the user making the request, the request path, the request action, etc. Authorization compares this information with the authorization policy; if it meets the policy, authorization is considered successful; otherwise, an error is returned.

The API Server currently supports the following authorization policies:

  • AlwaysDeny: Denies all requests, generally used for testing.
  • AlwaysAllow: Allows all requests, equivalent to no authorization process for the cluster (the default policy in Kubernetes).
  • ABAC: Attribute-Based Access Control, which uses user-defined authorization rules to match and control user requests.
  • Webhook: Authorizes users by calling an external REST service.
  • Node: A specialized mode used to control access to requests made by kubelet.
  • RBAC: Role-Based Access Control (the default option under kubeadm installation).

RBAC (Role-Based Access Control) mainly describes one thing: which objects are granted which permissions.

This involves the following concepts:

  • Objects: User, Groups, ServiceAccount.
  • Roles: Represent a collection of actions (permissions) defined on resources.
  • Bindings: Bind defined roles to objects.

27

RBAC introduces four top-level resource objects:

  • Role, ClusterRole: Roles used to specify a set of permissions.
  • RoleBinding, ClusterRoleBinding: Role bindings used to assign roles (permissions) to objects.

Role, ClusterRole

A role is a collection of permissions. The permissions here are all in the form of allowances (whitelists).

# Role can only authorize resources within the namespace and requires specifying the namespace.
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  namespace: dev
  name: authorization-role
rules:
- apiGroups: [""]  # Supported API group list; "" empty string indicates core API group.
  resources: ["pods"] # Supported resource object list.
  verbs: ["get", "watch", "list"] # Allowed operations on resource objects.
# ClusterRole can authorize resources at the cluster level, across namespaces, and non-resource types.
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
 name: authorization-clusterrole
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "watch", "list"]

It is important to explain the parameters in rules:

  • apiGroups: Supported API group list.
  "","apps", "autoscaling", "batch"
  • resources: Supported resource object list.
  "services", "endpoints", "pods","secrets","configmaps","crontabs","deployments","jobs",
  "nodes","rolebindings","clusterroles","daemonsets","replicasets","statefulsets",
  "horizontalpodautoscalers","replicationcontrollers","cronjobs"
  • verbs: List of allowed operations on resource objects.
  "get", "list", "watch", "create", "update", "patch", "delete", "exec"

RoleBinding, ClusterRoleBinding

Role bindings are used to bind a role to a target object. The binding target can be a User, Group, or ServiceAccount.

# RoleBinding can bind
Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.