Prometheus 配置服务发现 Kubernetes SD_configs 参数怎么设置？

在 Kubernetes 环境中，最推荐直接在 prometheus.yml 中使用 kubernetes_sd_configs 块，配合正确的 RBAC 权限让 Prometheus 自动发现目标，而不是手动维护静态列表。

先说结论：这是 K8s 环境下标准的服务发现方式，配置正确后能自动感知 Pod 和服务变化。

适合：运行在 Kubernetes 集群内部或能访问 API Server 的 Prometheus 实例
先准备：必须配置 ServiceAccount 及对应的 ClusterRole 权限，否则会出现 403 错误
验收：在 Prometheus UI 的 Targets 页面看到状态为 UP 的目标，且日志无权限报错

前置准备：RBAC 权限配置

Prometheus 需要访问 K8s API Server 来监听资源变化。如果权限不足，服务发现将无法工作。以下是最小权限的 RBAC 配置示例，请保存为 rbac.yaml。

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
  - apiGroups: [""]
    resources: ["nodes", "nodes/proxy", "services", "endpoints", "pods"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["extensions", "networking.k8s.io"]
    resources: ["ingresses"]
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
  - kind: ServiceAccount
    name: prometheus
    namespace: monitoring

应用配置命令：

kubectl apply -f rbac.yaml

配置文件编写

在 prometheus.yml 的 scrape_configs 下添加 kubernetes_sd_configs。注意 role 字段位于配置项层级，selectors 用于进一步过滤。

scrape_configs:
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
        selectors:
          - role: pod
            label: app=myapp
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)

关键参数说明：

role: 指定发现的对象类型，常用值包括 pod, service, endpoints, node。
selectors: 可选，用于缩小发现范围，避免抓取过多目标。
relabel_configs: 用于过滤目标或修改标签，上述示例仅抓取带有特定 annotation 的 Pod。

配置生效与热加载

修改配置后，需要让 Prometheus 重新加载。根据部署方式不同，有以下两种方法：

方法 1：ConfigMap 更新 + 重启（推荐）

如果 Prometheus 配置存储在 ConfigMap 中，更新后滚动重启 Deployment：

kubectl apply -f prometheus-configmap.yaml
kubectl rollout restart deployment/prometheus -n monitoring

方法 2：调用 reload 接口（需开启）

如果启动参数中添加了 `--web`.enable-lifecycle，可发送信号热加载：

kubectl exec -it prometheus-pod-name -n monitoring -- kill -HUP 1
# 或者
curl -X POST http://localhost:9090/-/reload

验证方法

1. UI 界面检查

2. API 查询

Prometheus 配置服务发现 Kubernetes SD_configs 参数怎么设置？

使用 curl 命令直接查询目标状态：

curl http://<prometheus-url>/api/v1/targets | jq '.data.activeTargets[].labels.job'

3. 日志排查

查看 Prometheus 日志，确认没有权限错误或连接超时：

kubectl logs -l app=prometheus -n monitoring `--tail`=100

常见坑与排查

1. 权限不足（403 Forbidden）

日志中出现 user "system:serviceaccount:monitoring:prometheus" cannot list resource "pods"。检查 ClusterRoleBinding 是否正确绑定了 ServiceAccount，以及 Namespace 是否匹配。

2. 网络不通

Prometheus 无法连接 API Server。检查集群内 DNS 解析（nslookup kubernetes.default）或防火墙规则。如果在集群外运行，需正确配置 kubeconfig 路径。

3. 重标签误删目标

relabel_configs 配置不当可能导致所有目标被 drop。建议先注释掉过滤规则，确认能发现目标后再逐步添加过滤条件。

4. 配置语法错误

Prometheus 启动失败通常伴随配置解析错误。使用官方工具验证配置：

promtool check config prometheus.yml

参考来源

Prometheus Official Documentation, Configuration configuration, https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config