How to monitor Linux servers with Prometheus
How to Monitor Linux servers with Prometheus
Prometheus has emerged as one of the most powerful and popular monitoring solutions for modern infrastructure. This open-source system monitoring and alerting toolkit provides comprehensive insights into Linux server performance, resource utilization, and system health. Whether you're managing a single server or a complex distributed infrastructure, Prometheus offers the scalability and flexibility needed to maintain optimal system performance.
In this comprehensive guide, you'll learn how to set up Prometheus to monitor Linux servers effectively. We'll cover everything from basic installation and configuration to advanced monitoring strategies, alerting rules, and best practices that will help you build a robust monitoring infrastructure.
What is Prometheus and Why Use It for Linux Server Monitoring?
Prometheus is a time-series database and monitoring system originally developed by SoundCloud. It collects metrics from configured targets at specified intervals, evaluates rule expressions, displays results, and triggers alerts when specified conditions are met. For Linux server monitoring, Prometheus offers several key advantages:
- Pull-based architecture: Prometheus scrapes metrics from targets, making it more reliable than push-based systems
- Powerful query language: PromQL allows complex queries and aggregations
- Service discovery: Automatic discovery of monitoring targets
- Built-in alerting: Integrated alert manager for notification handling
- Scalability: Handles thousands of targets and millions of time series
- Extensive ecosystem: Large collection of exporters for various services
Prerequisites and Requirements
Before beginning the Prometheus setup process, ensure you have the following prerequisites in place:
System Requirements
- Linux server with at least 2GB RAM and 10GB free disk space
- Root or sudo access to the target servers
- Network connectivity between Prometheus server and monitored targets
- Basic understanding of Linux command line and system administration
Software Prerequisites
- Linux distribution (Ubuntu 18.04+, CentOS 7+, or similar)
- Wget or curl for downloading packages
- Text editor (vim, nano, or your preferred editor)
- Firewall configuration knowledge (iptables, ufw, or firewalld)
Network Requirements
- Open ports: 9090 (Prometheus), 9100 (Node Exporter), 9093 (Alertmanager)
- Stable network connectivity between monitoring components
- DNS resolution or proper /etc/hosts configuration
Step 1: Installing Prometheus Server
Download and Install Prometheus
First, create a dedicated user for running Prometheus services:
```bash
sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
```
Download the latest Prometheus release:
```bash
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.40.0/prometheus-2.40.0.linux-amd64.tar.gz
tar xvf prometheus-2.40.0.linux-amd64.tar.gz
cd prometheus-2.40.0.linux-amd64
```
Copy the binaries and set proper permissions:
```bash
sudo cp prometheus /usr/local/bin/
sudo cp promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool
sudo cp -r consoles /etc/prometheus
sudo cp -r console_libraries /etc/prometheus
sudo chown -R prometheus:prometheus /etc/prometheus/consoles
sudo chown -R prometheus:prometheus /etc/prometheus/console_libraries
```
Configure Prometheus
Create the main Prometheus configuration file:
```bash
sudo nano /etc/prometheus/prometheus.yml
```
Add the following basic configuration:
```yaml
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "alert_rules.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'linux-servers'
static_configs:
- targets: ['localhost:9100']
```
Set proper ownership:
```bash
sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml
```
Create Prometheus Service
Create a systemd service file:
```bash
sudo nano /etc/systemd/system/prometheus.service
```
Add the following configuration:
```ini
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--web.listen-address=0.0.0.0:9090 \
--web.enable-lifecycle
[Install]
WantedBy=multi-user.target
```
Enable and start the Prometheus service:
```bash
sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
sudo systemctl status prometheus
```
Step 2: Installing Node Exporter
Node Exporter is essential for collecting Linux server metrics. It provides hardware and OS metrics exposed by *NIX kernels.
Download and Install Node Exporter
Create a user for Node Exporter:
```bash
sudo useradd --no-create-home --shell /bin/false node_exporter
```
Download Node Exporter:
```bash
cd /tmp
wget https://github.com/prometheus/node_exporter/releases/download/v1.4.0/node_exporter-1.4.0.linux-amd64.tar.gz
tar xvf node_exporter-1.4.0.linux-amd64.tar.gz
cd node_exporter-1.4.0.linux-amd64
sudo cp node_exporter /usr/local/bin
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter
```
Configure Node Exporter Service
Create the systemd service file:
```bash
sudo nano /etc/systemd/system/node_exporter.service
```
Add the following configuration:
```ini
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter \
--collector.systemd \
--collector.processes
[Install]
WantedBy=multi-user.target
```
Enable and start Node Exporter:
```bash
sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter
sudo systemctl status node_exporter
```
Step 3: Configuring Firewall Rules
Configure your firewall to allow necessary connections:
For Ubuntu/Debian (UFW):
```bash
sudo ufw allow 9090/tcp
sudo ufw allow 9100/tcp
sudo ufw allow 9093/tcp
sudo ufw reload
```
For CentOS/RHEL (firewalld):
```bash
sudo firewall-cmd --permanent --add-port=9090/tcp
sudo firewall-cmd --permanent --add-port=9100/tcp
sudo firewall-cmd --permanent --add-port=9093/tcp
sudo firewall-cmd --reload
```
Step 4: Setting Up Multiple Server Monitoring
To monitor multiple Linux servers, install Node Exporter on each target server and update the Prometheus configuration.
Install Node Exporter on Target Servers
Repeat the Node Exporter installation process on each server you want to monitor. Ensure Node Exporter is running and accessible on port 9100.
Update Prometheus Configuration
Edit the Prometheus configuration to include multiple targets:
```bash
sudo nano /etc/prometheus/prometheus.yml
```
Update the scrape_configs section:
```yaml
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'linux-servers'
static_configs:
- targets:
- 'server1.example.com:9100'
- 'server2.example.com:9100'
- '192.168.1.100:9100'
- '192.168.1.101:9100'
scrape_interval: 30s
metrics_path: /metrics
scrape_timeout: 10s
```
Reload Prometheus configuration:
```bash
sudo systemctl reload prometheus
```
Step 5: Creating Alert Rules
Alerting is crucial for proactive monitoring. Create alert rules for common Linux server issues.
Create Alert Rules File
```bash
sudo nano /etc/prometheus/alert_rules.yml
```
Add comprehensive alert rules:
```yaml
groups:
- name: linux_server_alerts
rules:
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage is above 80% for more than 5 minutes on {{ $labels.instance }}"
- alert: HighMemoryUsage
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage on {{ $labels.instance }}"
description: "Memory usage is above 85% on {{ $labels.instance }}"
- alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes{fstype!="tmpfs"} / node_filesystem_size_bytes{fstype!="tmpfs"}) * 100 < 10
for: 5m
labels:
severity: critical
annotations:
summary: "Low disk space on {{ $labels.instance }}"
description: "Disk space is below 10% on {{ $labels.instance }} filesystem {{ $labels.mountpoint }}"
- alert: HighLoadAverage
expr: node_load15 > 2
for: 10m
labels:
severity: warning
annotations:
summary: "High load average on {{ $labels.instance }}"
description: "Load average is {{ $value }} on {{ $labels.instance }}"
```
Set proper ownership and reload Prometheus:
```bash
sudo chown prometheus:prometheus /etc/prometheus/alert_rules.yml
sudo systemctl reload prometheus
```
Step 6: Installing and Configuring Alertmanager
Alertmanager handles alerts sent by Prometheus and routes them to notification channels.
Install Alertmanager
Create user and directories:
```bash
sudo useradd --no-create-home --shell /bin/false alertmanager
sudo mkdir /etc/alertmanager
sudo mkdir /var/lib/alertmanager
sudo chown alertmanager:alertmanager /etc/alertmanager
sudo chown alertmanager:alertmanager /var/lib/alertmanager
```
Download and install:
```bash
cd /tmp
wget https://github.com/prometheus/alertmanager/releases/download/v0.25.0/alertmanager-0.25.0.linux-amd64.tar.gz
tar xvf alertmanager-0.25.0.linux-amd64.tar.gz
cd alertmanager-0.25.0.linux-amd64
sudo cp alertmanager /usr/local/bin
sudo cp amtool /usr/local/bin
sudo chown alertmanager:alertmanager /usr/local/bin/alertmanager
sudo chown alertmanager:alertmanager /usr/local/bin/amtool
```
Configure Alertmanager
Create the configuration file:
```bash
sudo nano /etc/alertmanager/alertmanager.yml
```
Add basic email notification configuration:
```yaml
global:
smtp_smarthost: 'smtp.gmail.com:587'
smtp_from: 'alerts@yourcompany.com'
smtp_auth_username: 'alerts@yourcompany.com'
smtp_auth_password: 'your-app-password'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'email-notifications'
receivers:
- name: 'email-notifications'
email_configs:
- to: 'admin@yourcompany.com'
subject: 'Prometheus Alert: {{ .GroupLabels.alertname }}'
body: |
{{ range .Alerts }}
Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
{{ end }}
```
Create Alertmanager service:
```bash
sudo nano /etc/systemd/system/alertmanager.service
```
Add service configuration:
```ini
[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target
[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
--config.file /etc/alertmanager/alertmanager.yml \
--storage.path /var/lib/alertmanager/ \
--web.listen-address=0.0.0.0:9093
[Install]
WantedBy=multi-user.target
```
Start Alertmanager:
```bash
sudo chown -R alertmanager:alertmanager /etc/alertmanager
sudo systemctl daemon-reload
sudo systemctl enable alertmanager
sudo systemctl start alertmanager
```
Step 7: Advanced Monitoring Configurations
Service Discovery
For dynamic environments, configure service discovery instead of static targets:
```yaml
scrape_configs:
- job_name: 'consul-services'
consul_sd_configs:
- server: 'localhost:8500'
services: ['node-exporter']
relabel_configs:
- source_labels: [__meta_consul_service]
target_label: job
```
Custom Metrics Collection
Configure Node Exporter to collect additional metrics:
```bash
sudo nano /etc/systemd/system/node_exporter.service
```
Add custom collectors:
```ini
ExecStart=/usr/local/bin/node_exporter \
--collector.systemd \
--collector.processes \
--collector.interrupts \
--collector.tcpstat \
--collector.meminfo_numa
```
Practical Examples and Use Cases
Example 1: CPU Monitoring Query
Monitor CPU usage across all servers:
```promql
100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
```
Example 2: Memory Usage Monitoring
Track memory utilization:
```promql
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100
```
Example 3: Disk I/O Monitoring
Monitor disk read/write operations:
```promql
rate(node_disk_reads_completed_total[5m])
rate(node_disk_writes_completed_total[5m])
```
Example 4: Network Traffic Monitoring
Track network interface traffic:
```promql
rate(node_network_receive_bytes_total{device!="lo"}[5m])
rate(node_network_transmit_bytes_total{device!="lo"}[5m])
```
Common Issues and Troubleshooting
Issue 1: Prometheus Cannot Scrape Targets
Symptoms: Targets showing as "DOWN" in Prometheus web interface
Solutions:
- Verify Node Exporter is running: `sudo systemctl status node_exporter`
- Check firewall rules allow port 9100
- Verify network connectivity: `telnet target_server 9100`
- Check Prometheus logs: `sudo journalctl -u prometheus -f`
Issue 2: High Memory Usage by Prometheus
Symptoms: Prometheus consuming excessive memory
Solutions:
- Reduce retention period: `--storage.tsdb.retention.time=15d`
- Increase scrape intervals for less critical metrics
- Implement recording rules for complex queries
- Consider federation for large deployments
Issue 3: Missing Metrics
Symptoms: Expected metrics not appearing in Prometheus
Solutions:
- Verify Node Exporter collectors are enabled
- Check metric names have changed in newer versions
- Ensure proper relabeling configurations
- Verify target labels and job names
Issue 4: Alertmanager Not Sending Notifications
Symptoms: Alerts firing but notifications not received
Solutions:
- Verify SMTP configuration and credentials
- Check Alertmanager logs: `sudo journalctl -u alertmanager -f`
- Test email connectivity from server
- Verify routing rules in alertmanager.yml
Best Practices and Tips
Performance Optimization
1. Tune Scrape Intervals: Set appropriate scrape intervals based on metric importance
2. Use Recording Rules: Pre-calculate complex queries to reduce load
3. Implement Proper Retention: Balance storage costs with data retention needs
4. Monitor Prometheus Itself: Set up monitoring for your monitoring infrastructure
Security Considerations
1. Network Security: Use VPNs or private networks for metric collection
2. Authentication: Implement reverse proxy with authentication for web interfaces
3. Encryption: Use TLS for communication between components
4. Access Control: Limit access to Prometheus and Alertmanager interfaces
Scalability Strategies
1. Federation: Use Prometheus federation for multi-datacenter setups
2. Sharding: Distribute monitoring load across multiple Prometheus instances
3. Remote Storage: Implement remote storage for long-term retention
4. Horizontal Scaling: Scale Alertmanager for high availability
Maintenance Procedures
1. Regular Backups: Backup Prometheus configuration and critical data
2. Update Management: Keep Prometheus and exporters updated
3. Capacity Planning: Monitor storage growth and plan capacity accordingly
4. Documentation: Maintain documentation for custom configurations and procedures
Monitoring Dashboard Creation
While Prometheus provides a basic web interface, consider integrating with Grafana for advanced visualization:
Basic Grafana Integration
1. Install Grafana on the same server or a dedicated instance
2. Add Prometheus as a data source: `http://localhost:9090`
3. Import community dashboards for Linux server monitoring
4. Create custom dashboards for specific use cases
Essential Dashboard Panels
- System overview (CPU, Memory, Disk, Network)
- Process monitoring
- Service status monitoring
- Alert status and history
- Custom application metrics
Conclusion and Next Steps
Implementing Prometheus for Linux server monitoring provides a robust foundation for maintaining system health and performance. This comprehensive setup enables you to:
- Monitor critical system metrics across multiple servers
- Receive timely alerts for potential issues
- Analyze historical performance trends
- Scale monitoring infrastructure as your environment grows
Recommended Next Steps
1. Expand Monitoring Coverage: Add application-specific exporters for services like Apache, Nginx, MySQL, or PostgreSQL
2. Implement Grafana: Set up Grafana dashboards for better visualization and reporting
3. Enhance Alerting: Configure additional notification channels (Slack, PagerDuty, webhooks)
4. Automate Deployment: Use configuration management tools (Ansible, Puppet, Chef) to automate Prometheus deployment
5. Implement High Availability: Set up Prometheus in HA mode with multiple instances
6. Add Custom Metrics: Develop custom exporters for application-specific monitoring needs
By following this guide and implementing these best practices, you'll have a production-ready monitoring solution that scales with your infrastructure needs. Regular maintenance, monitoring of the monitoring system itself, and continuous improvement of alert rules will ensure your Linux servers remain healthy and performant.
Remember that monitoring is an iterative process. Start with basic metrics and alerts, then gradually expand coverage based on your specific requirements and operational experience. The investment in proper monitoring pays dividends in reduced downtime, faster issue resolution, and improved system reliability.