How to set up Grafana for Kubernetes in Linux
How to Set Up Grafana for Kubernetes in Linux
Introduction
Monitoring and observability are critical components of any successful Kubernetes deployment. Grafana, a powerful open-source analytics and interactive visualization web application, provides an excellent solution for monitoring Kubernetes clusters. This comprehensive guide will walk you through the complete process of setting up Grafana for Kubernetes monitoring in a Linux environment.
By the end of this tutorial, you'll have a fully functional Grafana installation that can visualize metrics from your Kubernetes cluster, create custom dashboards, and provide valuable insights into your containerized applications' performance and health.
Table of Contents
1. Prerequisites and Requirements
2. Understanding Grafana and Kubernetes Integration
3. Installation Methods Overview
4. Method 1: Installing Grafana using Helm
5. Method 2: Installing Grafana using YAML Manifests
6. Configuring Data Sources
7. Creating and Importing Dashboards
8. Setting Up Alerts and Notifications
9. Security Considerations
10. Troubleshooting Common Issues
11. Best Practices and Tips
12. Advanced Configuration Options
13. Conclusion and Next Steps
Prerequisites and Requirements
Before beginning the Grafana installation process, ensure you have the following prerequisites in place:
System Requirements
- Linux Distribution: Ubuntu 18.04+, CentOS 7+, or RHEL 7+
- Kubernetes Cluster: Version 1.16 or higher
- kubectl: Configured to communicate with your cluster
- Helm: Version 3.0+ (if using Helm installation method)
- Memory: Minimum 2GB RAM available for Grafana pod
- Storage: At least 10GB persistent storage
Access Requirements
- Administrative access to the Kubernetes cluster
- Network connectivity to pull container images
- Appropriate RBAC permissions for service account creation
- Access to configure ingress controllers (if exposing externally)
Recommended Tools
```bash
Verify kubectl access
kubectl cluster-info
Check Helm installation
helm version
Verify node resources
kubectl top nodes
```
Understanding Grafana and Kubernetes Integration
Grafana integrates with Kubernetes through several key components:
Core Components
1. Grafana Server: The main application providing the web interface
2. Data Sources: Connections to metrics providers like Prometheus
3. Dashboards: Visual representations of your metrics
4. Persistent Storage: For storing dashboards, users, and configuration
5. Service Account: For Kubernetes API access
Architecture Overview
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Kubernetes │───▶│ Prometheus │───▶│ Grafana │
│ Metrics API │ │ (Data Source) │ │ (Visualization)│
└─────────────────┘ └─────────────────┘ └─────────────────┘
```
Method 1: Installing Grafana using Helm
Helm provides the most straightforward method for installing Grafana with sensible defaults and easy customization options.
Step 1: Add the Grafana Helm Repository
```bash
Add the official Grafana Helm repository
helm repo add grafana https://grafana.github.io/helm-charts
Update repository information
helm repo update
Verify the repository addition
helm search repo grafana/grafana
```
Step 2: Create a Namespace
```bash
Create a dedicated namespace for monitoring
kubectl create namespace monitoring
Verify namespace creation
kubectl get namespaces | grep monitoring
```
Step 3: Create Custom Values File
Create a `grafana-values.yaml` file to customize your installation:
```yaml
grafana-values.yaml
persistence:
enabled: true
size: 10Gi
storageClassName: "standard"
adminPassword: "your-secure-password"
service:
type: ClusterIP
port: 80
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- grafana.yourdomain.com
tls:
- secretName: grafana-tls
hosts:
- grafana.yourdomain.com
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 250m
memory: 512Mi
serviceAccount:
create: true
name: grafana
rbac:
create: true
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus-server.monitoring.svc.cluster.local
access: proxy
isDefault: true
dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
- name: 'default'
orgId: 1
folder: ''
type: file
disableDeletion: false
editable: true
options:
path: /var/lib/grafana/dashboards/default
dashboards:
default:
kubernetes-cluster-monitoring:
gnetId: 7249
revision: 1
datasource: Prometheus
kubernetes-pod-monitoring:
gnetId: 6417
revision: 1
datasource: Prometheus
```
Step 4: Install Grafana
```bash
Install Grafana using Helm with custom values
helm install grafana grafana/grafana \
--namespace monitoring \
--values grafana-values.yaml
Verify the installation
kubectl get pods -n monitoring -l app.kubernetes.io/name=grafana
Check the service status
kubectl get svc -n monitoring grafana
```
Step 5: Access Grafana
```bash
Get the admin password (if not set in values file)
kubectl get secret --namespace monitoring grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
Port forward to access locally
kubectl port-forward --namespace monitoring svc/grafana 3000:80
Access Grafana at http://localhost:3000
Username: admin
Password: [retrieved from previous command or your custom password]
```
Method 2: Installing Grafana using YAML Manifests
For more granular control or environments where Helm isn't available, you can deploy Grafana using Kubernetes YAML manifests.
Step 1: Create Namespace and Service Account
```yaml
monitoring-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: grafana
namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: grafana
rules:
- apiGroups: [""]
resources: ["nodes", "nodes/proxy", "services", "endpoints", "pods"]
verbs: ["get", "list", "watch"]
- apiGroups: ["extensions"]
resources: ["ingresses"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: grafana
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: grafana
subjects:
- kind: ServiceAccount
name: grafana
namespace: monitoring
```
Step 2: Create ConfigMaps for Configuration
```yaml
grafana-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-config
namespace: monitoring
data:
grafana.ini: |
[server]
root_url = %(protocol)s://%(domain)s:%(http_port)s/
[security]
admin_user = admin
admin_password = your-secure-password
[users]
allow_sign_up = false
[auth.anonymous]
enabled = false
[log]
mode = console
level = info
---
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-datasources
namespace: monitoring
data:
prometheus.yaml: |
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus-server.monitoring.svc.cluster.local
isDefault: true
```
Step 3: Create Persistent Volume Claim
```yaml
grafana-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-pvc
namespace: monitoring
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: standard
```
Step 4: Create Grafana Deployment
```yaml
grafana-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
labels:
app: grafana
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
serviceAccountName: grafana
containers:
- name: grafana
image: grafana/grafana:latest
ports:
- containerPort: 3000
env:
- name: GF_SECURITY_ADMIN_PASSWORD
value: "your-secure-password"
- name: GF_USERS_ALLOW_SIGN_UP
value: "false"
volumeMounts:
- name: grafana-storage
mountPath: /var/lib/grafana
- name: grafana-config
mountPath: /etc/grafana/grafana.ini
subPath: grafana.ini
- name: grafana-datasources
mountPath: /etc/grafana/provisioning/datasources
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 250m
memory: 512Mi
livenessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 30
timeoutSeconds: 30
readinessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 5
timeoutSeconds: 10
volumes:
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
- name: grafana-config
configMap:
name: grafana-config
- name: grafana-datasources
configMap:
name: grafana-datasources
```
Step 5: Create Service and Ingress
```yaml
grafana-service.yaml
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: monitoring
spec:
selector:
app: grafana
ports:
- port: 80
targetPort: 3000
protocol: TCP
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana-ingress
namespace: monitoring
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- grafana.yourdomain.com
secretName: grafana-tls
rules:
- host: grafana.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: grafana
port:
number: 80
```
Step 6: Apply the Manifests
```bash
Apply all manifests
kubectl apply -f monitoring-namespace.yaml
kubectl apply -f grafana-config.yaml
kubectl apply -f grafana-pvc.yaml
kubectl apply -f grafana-deployment.yaml
kubectl apply -f grafana-service.yaml
Verify the deployment
kubectl get all -n monitoring -l app=grafana
```
Configuring Data Sources
Once Grafana is running, you need to configure data sources to visualize your Kubernetes metrics.
Prometheus Data Source Configuration
1. Access Grafana Interface: Navigate to your Grafana instance
2. Go to Configuration: Click on the gear icon in the left sidebar
3. Select Data Sources: Click on "Data Sources"
4. Add Data Source: Click "Add data source" and select "Prometheus"
Configure the Prometheus data source with these settings:
```yaml
Name: Prometheus
URL: http://prometheus-server.monitoring.svc.cluster.local
Access: Server (default)
Scrape interval: 15s
Query timeout: 60s
HTTP Method: GET
```
Testing the Connection
```bash
Test Prometheus connectivity from within the cluster
kubectl run test-pod --rm -i --tty --image=curlimages/curl -- sh
Inside the pod, test the connection
curl http://prometheus-server.monitoring.svc.cluster.local/api/v1/query?query=up
```
Creating and Importing Dashboards
Grafana dashboards provide visual representations of your Kubernetes metrics. You can create custom dashboards or import pre-built ones from the Grafana community.
Importing Pre-built Dashboards
1. Navigate to Dashboards: Click the "+" icon and select "Import"
2. Enter Dashboard ID: Use popular Kubernetes dashboard IDs:
- Kubernetes Cluster Monitoring: 7249
- Kubernetes Pod Monitoring: 6417
- Node Exporter Full: 1860
- Kubernetes Deployment Statefulset Daemonset metrics: 8588
Creating Custom Dashboards
```json
{
"dashboard": {
"id": null,
"title": "Custom Kubernetes Dashboard",
"panels": [
{
"title": "CPU Usage by Pod",
"type": "graph",
"targets": [
{
"expr": "sum(rate(container_cpu_usage_seconds_total{pod!=\"\"}[5m])) by (pod)",
"legendFormat": "{{pod}}"
}
]
},
{
"title": "Memory Usage by Pod",
"type": "graph",
"targets": [
{
"expr": "sum(container_memory_usage_bytes{pod!=\"\"}) by (pod)",
"legendFormat": "{{pod}}"
}
]
}
]
}
}
```
Dashboard as Code
Save dashboards as JSON files for version control:
```bash
Export dashboard
curl -H "Authorization: Bearer " \
http://grafana.yourdomain.com/api/dashboards/db/kubernetes-overview
Import dashboard via API
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer " \
-d @dashboard.json \
http://grafana.yourdomain.com/api/dashboards/db
```
Setting Up Alerts and Notifications
Grafana's alerting system helps you stay informed about critical issues in your Kubernetes cluster.
Configuring Alert Rules
1. Create Alert Rule: In your dashboard panel, click "Edit" → "Alert"
2. Set Conditions: Define when alerts should trigger
Example alert configuration:
```yaml
Alert for high CPU usage
Name: High CPU Usage
Condition:
Query: A
Reducer: avg
Type: Is above
Threshold: 80
Evaluation:
Evaluate every: 1m
For: 5m
Notifications:
Send to: kubernetes-alerts
```
Setting Up Notification Channels
Configure notification channels for alert delivery:
```yaml
Slack notification channel
Name: kubernetes-alerts
Type: Slack
Settings:
URL: https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK
Channel: #alerts
Title: Kubernetes Alert
Text: |
Alert: {{ range .Alerts }}{{ .Annotations.summary }}{{ end }}
Status: {{ .Status }}
Severity: {{ .CommonLabels.severity }}
```
Email Notifications
Configure SMTP settings in Grafana configuration:
```ini
[smtp]
enabled = true
host = smtp.gmail.com:587
user = your-email@gmail.com
password = your-app-password
skip_verify = false
from_address = your-email@gmail.com
from_name = Grafana Alerts
```
Security Considerations
Securing your Grafana installation is crucial for protecting sensitive monitoring data.
Authentication and Authorization
1. Disable Anonymous Access:
```ini
[auth.anonymous]
enabled = false
```
2. Configure LDAP/OAuth:
```ini
[auth.ldap]
enabled = true
config_file = /etc/grafana/ldap.toml
```
3. Set Strong Admin Password:
```bash
Generate secure password
openssl rand -base64 32
Update admin password
kubectl patch secret grafana -n monitoring -p '{"data":{"admin-password":"'$(echo -n 'new-password' | base64)'"}}'
```
Network Security
1. Use TLS/HTTPS:
```yaml
Enable TLS in ingress
spec:
tls:
- hosts:
- grafana.yourdomain.com
secretName: grafana-tls
```
2. Network Policies:
```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: grafana-network-policy
namespace: monitoring
spec:
podSelector:
matchLabels:
app: grafana
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 3000
egress:
- to:
- namespaceSelector:
matchLabels:
name: monitoring
ports:
- protocol: TCP
port: 9090
```
Data Protection
1. Backup Dashboards:
```bash
#!/bin/bash
Backup script
GRAFANA_URL="http://grafana.yourdomain.com"
API_KEY="your-api-key"
BACKUP_DIR="/backup/grafana-$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"
Export all dashboards
curl -H "Authorization: Bearer $API_KEY" \
"$GRAFANA_URL/api/search?type=dash-db" | \
jq -r '.[].uri' | \
while read -r uri; do
slug=$(echo "$uri" | cut -d'/' -f2)
curl -H "Authorization: Bearer $API_KEY" \
"$GRAFANA_URL/api/dashboards/$uri" > "$BACKUP_DIR/$slug.json"
done
```
Troubleshooting Common Issues
Issue 1: Grafana Pod Not Starting
Symptoms: Pod remains in `Pending` or `CrashLoopBackOff` state
Diagnosis:
```bash
Check pod status
kubectl get pods -n monitoring -l app=grafana
View pod logs
kubectl logs -n monitoring deployment/grafana
Describe pod for events
kubectl describe pod -n monitoring -l app=grafana
```
Solutions:
1. Insufficient Resources:
```yaml
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "1Gi"
cpu: "500m"
```
2. Storage Issues:
```bash
Check PVC status
kubectl get pvc -n monitoring
Verify storage class
kubectl get storageclass
```
Issue 2: Cannot Connect to Data Sources
Symptoms: "HTTP Error Bad Gateway" or connection timeouts
Diagnosis:
```bash
Test connectivity from Grafana pod
kubectl exec -n monitoring deployment/grafana -- \
curl -v http://prometheus-server.monitoring.svc.cluster.local:9090/api/v1/query?query=up
Check service endpoints
kubectl get endpoints -n monitoring prometheus-server
```
Solutions:
1. Verify Service Names:
```bash
List services in monitoring namespace
kubectl get svc -n monitoring
Check DNS resolution
kubectl exec -n monitoring deployment/grafana -- \
nslookup prometheus-server.monitoring.svc.cluster.local
```
2. Network Policy Issues:
```yaml
Allow egress to Prometheus
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: grafana-egress
namespace: monitoring
spec:
podSelector:
matchLabels:
app: grafana
policyTypes:
- Egress
egress:
- to:
- podSelector:
matchLabels:
app: prometheus
ports:
- protocol: TCP
port: 9090
```
Issue 3: Dashboard Import Failures
Symptoms: "Dashboard import failed" or missing visualizations
Solutions:
1. Check Data Source Compatibility:
```bash
Verify Prometheus is returning data
curl "http://prometheus-server.monitoring.svc.cluster.local:9090/api/v1/label/__name__/values"
```
2. Update Dashboard JSON:
```json
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
}
}
```
Issue 4: Performance Issues
Symptoms: Slow dashboard loading or query timeouts
Solutions:
1. Optimize Queries:
```promql
Instead of this (high cardinality)
sum(rate(container_cpu_usage_seconds_total[5m])) by (container, pod, namespace)
Use this (lower cardinality)
sum(rate(container_cpu_usage_seconds_total[5m])) by (namespace)
```
2. Increase Resources:
```yaml
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 500m
memory: 1Gi
```
3. Configure Query Timeout:
```ini
[dataproxy]
timeout = 30
```
Best Practices and Tips
Performance Optimization
1. Use Appropriate Time Ranges:
- Limit default time ranges to reasonable periods
- Use variables for dynamic time selection
- Implement auto-refresh intervals carefully
2. Query Optimization:
```promql
Good: Specific label matching
rate(http_requests_total{job="api-server"}[5m])
Avoid: Broad queries without filtering
rate(http_requests_total[5m])
```
3. Dashboard Organization:
- Group related panels logically
- Use folders for dashboard organization
- Implement consistent naming conventions
Monitoring Strategy
1. Key Metrics to Monitor:
- Cluster Level: Node CPU, Memory, Disk usage
- Namespace Level: Resource quotas and limits
- Pod Level: Container restart counts, resource usage
- Application Level: Custom application metrics
2. Alert Strategy:
```yaml
Critical alerts (immediate response)
- High CPU usage (>90% for 5 minutes)
- Memory exhaustion (>95% for 2 minutes)
- Pod crash loops (>5 restarts in 10 minutes)
Warning alerts (investigation needed)
- Moderate CPU usage (>70% for 15 minutes)
- High memory usage (>80% for 10 minutes)
- Persistent volume space (>85% full)
```
Maintenance and Operations
1. Regular Backups:
```bash
Automated backup script
#!/bin/bash
kubectl create job grafana-backup-$(date +%s) \
--from=cronjob/grafana-backup \
--namespace=monitoring
```
2. Version Management:
```yaml
Use specific versions in production
image: grafana/grafana:9.3.2
```
3. Monitoring Grafana Itself:
```promql
Grafana uptime
up{job="grafana"}
Dashboard load times
grafana_dashboard_load_duration_seconds
```
Advanced Configuration Options
High Availability Setup
For production environments, consider implementing high availability:
```yaml
grafana-ha-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
template:
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- grafana
topologyKey: kubernetes.io/hostname
```
External Database Configuration
For enterprise deployments, use external databases:
```yaml
env:
- name: GF_DATABASE_TYPE
value: mysql
- name: GF_DATABASE_HOST
value: mysql.database.svc.cluster.local:3306
- name: GF_DATABASE_NAME
value: grafana
- name: GF_DATABASE_USER
valueFrom:
secretKeyRef:
name: grafana-db-secret
key: username
- name: GF_DATABASE_PASSWORD
valueFrom:
secretKeyRef:
name: grafana-db-secret
key: password
```
Custom Plugins
Install custom plugins for extended functionality:
```yaml
env:
- name: GF_INSTALL_PLUGINS
value: "grafana-piechart-panel,grafana-worldmap-panel"
initContainers:
- name: download-plugins
image: grafana/grafana:latest
command:
- sh
- -c
- |
grafana-cli plugins install grafana-piechart-panel
grafana-cli plugins install grafana-worldmap-panel
volumeMounts:
- name: grafana-plugins
mountPath: /var/lib/grafana/plugins
```
Conclusion and Next Steps
Setting up Grafana for Kubernetes monitoring in Linux provides powerful visualization capabilities for your containerized infrastructure. This comprehensive guide has covered:
- Multiple installation methods (Helm and YAML manifests)
- Data source configuration and integration
- Dashboard creation and management
- Security considerations and best practices
- Troubleshooting common issues
- Advanced configuration options
Next Steps
1. Expand Monitoring Coverage:
- Install Prometheus Node Exporter for detailed node metrics
- Add application-specific metrics and dashboards
- Implement distributed tracing with Jaeger integration
2. Enhance Alerting:
- Create comprehensive alert rules for your specific use cases
- Set up alert routing and escalation policies
- Implement alert fatigue reduction strategies
3. Automation and GitOps:
- Implement dashboard as code practices
- Set up automated backup and restore procedures
- Integrate with CI/CD pipelines for dashboard deployment
4. Performance Tuning:
- Monitor Grafana's own performance metrics
- Optimize query performance and dashboard load times
- Implement caching strategies for frequently accessed data
5. Advanced Features:
- Explore Grafana's annotation capabilities
- Implement template variables for dynamic dashboards
- Set up data source proxying and federation
By following this guide and implementing these best practices, you'll have a robust monitoring solution that provides valuable insights into your Kubernetes cluster's health and performance. Remember to regularly review and update your monitoring strategy as your infrastructure evolves and grows.
The combination of Kubernetes, Prometheus, and Grafana creates a powerful observability stack that scales with your applications and provides the visibility needed to maintain reliable, high-performance containerized services.