k8s-ephemeral-storage-metrics

K8s Ephemeral Storage Metrics

License: MIT Actions Status Artifact Hub GitHub Downloads (all assets, all releases)

A prometheus ephemeral storage metric exporter for pods, containers, nodes, and volumes.

This project was created to address lack of monitoring in Kubernetes

This project does not monitor CSI backed ephemeral storage ex. Generic ephemeral volumes

main image

Helm Install

helm repo add k8s-ephemeral-storage-metrics https://jmcgrath207.github.io/k8s-ephemeral-storage-metrics/chart
helm repo update
helm upgrade --install my-deployment k8s-ephemeral-storage-metrics/k8s-ephemeral-storage-metrics

Values

Key Type Default Description
affinity object {}  
containerSecurityContext.allowPrivilegeEscalation bool false  
containerSecurityContext.capabilities.drop[0] string "ALL"  
containerSecurityContext.privileged bool false  
containerSecurityContext.readOnlyRootFilesystem bool false  
containerSecurityContext.runAsNonRoot bool true  
deploy_type string "Deployment" Set as Deployment for single controller to query all nodes or Daemonset
dev object {"enabled":false,"grow":{"image":"ghcr.io/jmcgrath207/k8s-ephemeral-storage-grow-test:latest","imagePullPolicy":"IfNotPresent"},"shrink":{"image":"ghcr.io/jmcgrath207/k8s-ephemeral-storage-shrink-test:latest","imagePullPolicy":"IfNotPresent"}} For local development or testing that will deploy grow and shrink pods and debug service
image.imagePullPolicy string "IfNotPresent"  
image.imagePullSecrets list []  
image.repository string "ghcr.io/jmcgrath207/k8s-ephemeral-storage-metrics"  
image.tag string "1.15.0"  
interval int 15 Polling node rate for exporter
kubelet object {"insecure":false,"readOnlyPort":0,"scrape":false} Scrape metrics through kubelet instead of kube api
log_level string "info"  
max_node_concurrency int 10 Max number of concurrent query requests to the kubernetes API.
metrics object {"adjusted_polling_rate":false,"ephemeral_storage_container_limit_percentage":true,"ephemeral_storage_container_volume_limit_percentage":true,"ephemeral_storage_container_volume_usage":true,"ephemeral_storage_node_available":true,"ephemeral_storage_node_capacity":true,"ephemeral_storage_node_percentage":true,"ephemeral_storage_pod_usage":true,"port":9100} Set metrics you want to enable
metrics.adjusted_polling_rate bool false Create the ephemeral_storage_adjusted_polling_rate metrics to report Adjusted Poll Rate in milliseconds. Typically used for testing.
metrics.ephemeral_storage_container_limit_percentage bool true Percentage of ephemeral storage used by a container in a pod
metrics.ephemeral_storage_container_volume_limit_percentage bool true Percentage of ephemeral storage used by a container’s volume in a pod
metrics.ephemeral_storage_container_volume_usage bool true Current ephemeral storage used by a container’s volume in a pod
metrics.ephemeral_storage_node_available bool true Available ephemeral storage for a node
metrics.ephemeral_storage_node_capacity bool true Capacity of ephemeral storage for a node
metrics.ephemeral_storage_node_percentage bool true Percentage of ephemeral storage used on a node
metrics.ephemeral_storage_pod_usage bool true Current ephemeral byte usage of pod
metrics.port int 9100 Adjust the metric port as needed (default 9100)
nodeSelector object {}  
podAnnotations object {}  
podSecurityContext.runAsNonRoot bool true  
podSecurityContext.seccompProfile.type string "RuntimeDefault"  
pprof bool false Enable Pprof
priorityClassName string nil  
prometheus.enable bool true  
prometheus.release string "kube-prometheus-stack"  
prometheus.rules.enable bool false Create PrometheusRules firing alerts when out of ephemeral storage
prometheus.rules.labels object {"severity":"warning"} What additional labels to set on alerts
prometheus.rules.predictFilledHours int 12 How many hours in the future to predict filling up of a volume
serviceMonitor object {"additionalLabels":{},"enable":true,"metricRelabelings":[],"podTargetLabels":[],"relabelings":[],"targetLabels":[]} Configure the Service Monitor
serviceMonitor.additionalLabels object {} Add labels to the ServiceMonitor.Spec
serviceMonitor.metricRelabelings list [] Set metricRelabelings as per https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#monitoring.coreos.com/v1.RelabelConfig
serviceMonitor.podTargetLabels list [] Set podTargetLabels as per https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#monitoring.coreos.com/v1.ServiceMonitorSpec
serviceMonitor.relabelings list [] Set relabelings as per https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#monitoring.coreos.com/v1.RelabelConfig
serviceMonitor.targetLabels list [] Set targetLabels as per https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#monitoring.coreos.com/v1.ServiceMonitorSpec
tolerations list []  

Prometheus alert rules

To prevent from multiple kind of alerts being fired for a single container or emptyDir volume when both prometheus.enable and prometheus.rules.enable are on, add the following inhibition rules to your Alert Manager config:

- source_matchers:
    - alertname="EphemeralStorageVolumeFilledUp"
  target_matchers:
    - severity="warning"
    - alertname="EphemeralStorageVolumeFillingUp"
  equal:
    - pod_namespace
    - pod_name
    - volume_name
- source_matchers:
    - alertname="ContainerEphemeralStorageUsageAtLimit"
  target_matchers:
    - severity="warning"
    - alertname="ContainerEphemeralStorageUsageReachingLimit"
  equal:
    - pod_namespace
    - pod_name
    - exported_container

Contribute

Start minikube

make new_minikube

Run locally

make deploy_local

Run locally with Delve Debug

make deploy_debug

Then connect to localhost:30002 with delve or your IDE.

Run e2e Test

make deploy_e2e

Debug e2e

make deploy_e2e_debug

Then run a debug against deployment_test.go

License

This project is licensed under the MIT License. See the LICENSE file for more details.