隨着雲原生技術的發展,容器編排管理工具Kubernetes(簡稱K8s)在實現自動化運維和應用部署上得到了廣泛應用。而監控系統Prometheus則是必不可少的一部分,作為雲原生下的監控解決方案,它既輕巧又功能豐富,能夠為應用提供實時監控和故障告警。本文將從多個方面詳細介紹如何在K8s上部署Prometheus。
一、k8s部署prometheus高可用
在生產環境中,高可用性是最為重要的一個考慮因素。K8s的彈性擴容和服務發現機制,使得我們可以很容易地實現Prometheus的高可用部署。下面是具體實現步驟:
1、首先,我們需要將Prometheus放在K8s集群中,如採用StatefulSet控制器部署。該控制器支持Pod名稱的持久性和網絡標識的固定,能夠更好地保證Pod的唯一性和穩定性。
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
spec:
replicas: 3
serviceName: prometheus
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: prom/prometheus
args:
- --storage.tsdb.retention.time=3d
- --web.enable-lifecycle
ports:
- containerPort: 9090
name: web
volumeMounts:
- name: prometheus-data
mountPath: /prometheus
volumes:
- name: prometheus-data
emptyDir: {}
2、為Prometheus部署Service資源,確保集群中的所有節點都能發現到Prometheus服務並進行通信。
apiVersion: v1
kind: Service
metadata:
name: prometheus
spec:
selector:
app: prometheus
ports:
- protocol: TCP
port: 9090
targetPort: web
3、在K8s集群中安裝Etcd服務,用作K8s集群中各個節點的狀態管理和同步。
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: etcd
spec:
replicas: 3
serviceName: etcd
selector:
matchLabels:
app: etcd
template:
metadata:
labels:
app: etcd
spec:
containers:
- name: etcd
image: quay.io/coreos/etcd
command:
- /usr/local/bin/etcd
- --data-dir=/etcd-data
- --name=node1
- --initial-advertise-peer-urls=http://node1.etcd:2380
- --listen-peer-urls=http://0.0.0.0:2380
- --advertise-client-urls=http://node1.etcd:2379
- --listen-client-urls=http://0.0.0.0:2379
- --initial-cluster=node1=http://node1.etcd:2380,node2=http://node2.etcd:2380,node3=http://node3.etcd:2380
- --initial-cluster-token=etcd-cluster-1
- --initial-cluster-state=new
ports:
- containerPort: 2380
name: peer
- containerPort: 2379
name: client
volumeMounts:
- name: etcd-data
mountPath: /etcd-data
volumes:
- name: etcd-data
emptyDir: {}
4、將Prometheus的狀態信息存儲到Etcd的數據目錄中。這樣,當一個節點宕機後,Pod重新啟動後可以讀取到它前一個狀態(原來的節點名稱及數據目錄),從而保證數據不丟失。
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-conf
data:
prometheus.yml: |-
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- /etc/prometheus/rules
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'node-exporter'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9100']
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
remote_write:
- url: http://node1.etcd:2379/metrics
5、將Prometheus配置文件中的一些關鍵參數修改,以在集群中尋找Prometheus的實例和服務,這裡以修改`scrape_configs`和`remote_write`兩項內容為例。
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['prometheus-0.prometheus.default.svc.cluster.local:9090', 'prometheus-1.prometheus.default.svc.cluster.local:9090', 'prometheus-2.prometheus.default.svc.cluster.local:9090']
- job_name: 'node-exporter'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9100']
remote_write:
- url: http://etcd:2379/metrics
在修改完成後,將Prometheus重新啟動即可實現K8s集群中的高可用部署。
二、k8s部署prometheus+alertmanager
與K8s集群高可用部署相似,Prometheus結合Alertmanager平台也是方便快捷的,尤其在實現告警通知的過程中。下面是具體實現步驟:
1、首先,在K8s集群中部署Alertmanager。根據K8s文檔中提供的alertmanager.yaml進行部署,其中涉及到的服務端口、訪問服務的DNS名稱等需要根據實際環境進行修改。
apiVersion: apps/v1
kind: Deployment
metadata:
name: alertmanager
spec:
replicas: 1
selector:
matchLabels:
app: alertmanager
template:
metadata:
labels:
app: alertmanager
spec:
containers:
- name: alertmanager
image: prom/alertmanager
args:
- --config.file=/etc/alertmanager/config.yml
ports:
- containerPort: 9093
name: web
volumeMounts:
- name: alertmanager-data
mountPath: /alertmanager
volumes:
- name: alertmanager-data
emptyDir: {}
- name: alertmanager-conf
configMap:
name: alertmanager-conf
---
apiVersion: v1
kind: Service
metadata:
name: alertmanager
spec:
selector:
app: alertmanager
ports:
- protocol: TCP
port: 9093
targetPort: web
type: NodePort
externalTrafficPolicy: Cluster
---
apiVersion: v1
kind: Service
metadata:
name: alertmanager-operated
spec:
selector:
app: alertmanager
ports:
- protocol: TCP
port: 9093
targetPort: web
type: NodePort
externalTrafficPolicy: Cluster
2、在Prometheus中配置告警規則(rules),並將告警信息推送至Alertmanager。
rules:
- alert: HighRequestLatency
expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
for: 10m
labels:
severity: warning
annotations:
description: 'High request latency: {{ $value }}'
summary: 'High latency for {{ $labels.instance }}'
alerting:
- alertmanagers:
- static_configs:
- targets:
- "alertmanager-operated.default.svc.cluster.local"
3、在K8s集群中部署Prometheus。
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: prom/prometheus
args:
- --config.file=/etc/prometheus/prometheus.yml
- --web.enable-lifecycle
ports:
- containerPort: 9090
name: web
volumeMounts:
- name: prometheus-data
mountPath: /prometheus
volumes:
- name: prometheus-data
emptyDir: {}
- name: prometheus-conf
configMap:
name: prometheus-conf
---
apiVersion: v1
kind: Service
metadata:
name: prometheus
spec:
selector:
app: prometheus
ports:
- protocol: TCP
port: 9090
targetPort: web
type: NodePort
externalTrafficPolicy: Cluster
在完成以上步驟後,就可以實現Prometheus和Alertmanager的聯動,並在規定條件達到時實現告警功能。
三、k8s部署prometheus多副本
為了避免單點故障,我們可以部署多個Prometheus實例,實現保證性能和可用性的目的。
1、首先,在K8s集群中部署多個Prometheus實例。
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
spec:
replicas: 3
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: prom/prometheus
args:
- --config.file=/etc/prometheus/prometheus.yml
- --web.enable-lifecycle
ports:
- containerPort: 9090
name: web
volumeMounts:
- name: prometheus-data
mountPath: /prometheus
volumes:
- name: prometheus-data
emptyDir: {}
- name: prometheus-conf
configMap:
name: prometheus-conf
2、為每個Prometheus實例部署一個Service資源。
apiVersion: v1
kind: Service
metadata:
name: prometheus
spec:
selector:
app: prometheus
ports:
- protocol: TCP
port: 9090
targetPort: web
type: NodePort
externalTrafficPolicy: Cluster
在完成以上步驟後,就可以輕鬆實現Prometheus的多副本部署。需要注意的是,在部署多副本時還需要修改Prometheus配置文件,確保每個實例都能夠正確地在K8s集群中進行工作。
四、k8s部署prometheus的流程與方式
下面我們將一步一步帶你了解如何在K8s集群中部署Prometheus。
1、安裝K8s集群,確保集群中有足夠的節點。如果您沒有K8s集群,請參考官方文檔進行安裝。
2、將Prometheus的鏡像文件上傳至鏡像倉庫,或者手動將Prometheus的鏡像文件保存到本地,並將其打成一個.tar包。
3、為Prometheus定義一個Deployment資源,用於部署多個Pod實例。
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
spec:
replicas: 3
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: prom/prometheus
args:原創文章,作者:SBCZ,如若轉載,請註明出處:https://www.506064.com/zh-hk/n/148998.html