How to monitor Linux servers with Prometheus

How to Monitor Linux servers with Prometheus Prometheus has emerged as one of the most powerful and popular monitoring solutions for modern infrastructure. This open-source system monitoring and alerting toolkit provides comprehensive insights into Linux server performance, resource utilization, and system health. Whether you're managing a single server or a complex distributed infrastructure, Prometheus offers the scalability and flexibility needed to maintain optimal system performance. In this comprehensive guide, you'll learn how to set up Prometheus to monitor Linux servers effectively. We'll cover everything from basic installation and configuration to advanced monitoring strategies, alerting rules, and best practices that will help you build a robust monitoring infrastructure. What is Prometheus and Why Use It for Linux Server Monitoring? Prometheus is a time-series database and monitoring system originally developed by SoundCloud. It collects metrics from configured targets at specified intervals, evaluates rule expressions, displays results, and triggers alerts when specified conditions are met. For Linux server monitoring, Prometheus offers several key advantages: - Pull-based architecture: Prometheus scrapes metrics from targets, making it more reliable than push-based systems - Powerful query language: PromQL allows complex queries and aggregations - Service discovery: Automatic discovery of monitoring targets - Built-in alerting: Integrated alert manager for notification handling - Scalability: Handles thousands of targets and millions of time series - Extensive ecosystem: Large collection of exporters for various services Prerequisites and Requirements Before beginning the Prometheus setup process, ensure you have the following prerequisites in place: System Requirements - Linux server with at least 2GB RAM and 10GB free disk space - Root or sudo access to the target servers - Network connectivity between Prometheus server and monitored targets - Basic understanding of Linux command line and system administration Software Prerequisites - Linux distribution (Ubuntu 18.04+, CentOS 7+, or similar) - Wget or curl for downloading packages - Text editor (vim, nano, or your preferred editor) - Firewall configuration knowledge (iptables, ufw, or firewalld) Network Requirements - Open ports: 9090 (Prometheus), 9100 (Node Exporter), 9093 (Alertmanager) - Stable network connectivity between monitoring components - DNS resolution or proper /etc/hosts configuration Step 1: Installing Prometheus Server Download and Install Prometheus First, create a dedicated user for running Prometheus services: ```bash sudo useradd --no-create-home --shell /bin/false prometheus sudo mkdir /etc/prometheus sudo mkdir /var/lib/prometheus sudo chown prometheus:prometheus /etc/prometheus sudo chown prometheus:prometheus /var/lib/prometheus ``` Download the latest Prometheus release: ```bash cd /tmp wget https://github.com/prometheus/prometheus/releases/download/v2.40.0/prometheus-2.40.0.linux-amd64.tar.gz tar xvf prometheus-2.40.0.linux-amd64.tar.gz cd prometheus-2.40.0.linux-amd64 ``` Copy the binaries and set proper permissions: ```bash sudo cp prometheus /usr/local/bin/ sudo cp promtool /usr/local/bin/ sudo chown prometheus:prometheus /usr/local/bin/prometheus sudo chown prometheus:prometheus /usr/local/bin/promtool sudo cp -r consoles /etc/prometheus sudo cp -r console_libraries /etc/prometheus sudo chown -R prometheus:prometheus /etc/prometheus/consoles sudo chown -R prometheus:prometheus /etc/prometheus/console_libraries ``` Configure Prometheus Create the main Prometheus configuration file: ```bash sudo nano /etc/prometheus/prometheus.yml ``` Add the following basic configuration: ```yaml global: scrape_interval: 15s evaluation_interval: 15s rule_files: - "alert_rules.yml" alerting: alertmanagers: - static_configs: - targets: - localhost:9093 scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: 'linux-servers' static_configs: - targets: ['localhost:9100'] ``` Set proper ownership: ```bash sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml ``` Create Prometheus Service Create a systemd service file: ```bash sudo nano /etc/systemd/system/prometheus.service ``` Add the following configuration: ```ini [Unit] Description=Prometheus Wants=network-online.target After=network-online.target [Service] User=prometheus Group=prometheus Type=simple ExecStart=/usr/local/bin/prometheus \ --config.file /etc/prometheus/prometheus.yml \ --storage.tsdb.path /var/lib/prometheus/ \ --web.console.templates=/etc/prometheus/consoles \ --web.console.libraries=/etc/prometheus/console_libraries \ --web.listen-address=0.0.0.0:9090 \ --web.enable-lifecycle [Install] WantedBy=multi-user.target ``` Enable and start the Prometheus service: ```bash sudo systemctl daemon-reload sudo systemctl enable prometheus sudo systemctl start prometheus sudo systemctl status prometheus ``` Step 2: Installing Node Exporter Node Exporter is essential for collecting Linux server metrics. It provides hardware and OS metrics exposed by *NIX kernels. Download and Install Node Exporter Create a user for Node Exporter: ```bash sudo useradd --no-create-home --shell /bin/false node_exporter ``` Download Node Exporter: ```bash cd /tmp wget https://github.com/prometheus/node_exporter/releases/download/v1.4.0/node_exporter-1.4.0.linux-amd64.tar.gz tar xvf node_exporter-1.4.0.linux-amd64.tar.gz cd node_exporter-1.4.0.linux-amd64 sudo cp node_exporter /usr/local/bin sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter ``` Configure Node Exporter Service Create the systemd service file: ```bash sudo nano /etc/systemd/system/node_exporter.service ``` Add the following configuration: ```ini [Unit] Description=Node Exporter Wants=network-online.target After=network-online.target [Service] User=node_exporter Group=node_exporter Type=simple ExecStart=/usr/local/bin/node_exporter \ --collector.systemd \ --collector.processes [Install] WantedBy=multi-user.target ``` Enable and start Node Exporter: ```bash sudo systemctl daemon-reload sudo systemctl enable node_exporter sudo systemctl start node_exporter sudo systemctl status node_exporter ``` Step 3: Configuring Firewall Rules Configure your firewall to allow necessary connections: For Ubuntu/Debian (UFW): ```bash sudo ufw allow 9090/tcp sudo ufw allow 9100/tcp sudo ufw allow 9093/tcp sudo ufw reload ``` For CentOS/RHEL (firewalld): ```bash sudo firewall-cmd --permanent --add-port=9090/tcp sudo firewall-cmd --permanent --add-port=9100/tcp sudo firewall-cmd --permanent --add-port=9093/tcp sudo firewall-cmd --reload ``` Step 4: Setting Up Multiple Server Monitoring To monitor multiple Linux servers, install Node Exporter on each target server and update the Prometheus configuration. Install Node Exporter on Target Servers Repeat the Node Exporter installation process on each server you want to monitor. Ensure Node Exporter is running and accessible on port 9100. Update Prometheus Configuration Edit the Prometheus configuration to include multiple targets: ```bash sudo nano /etc/prometheus/prometheus.yml ``` Update the scrape_configs section: ```yaml scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: 'linux-servers' static_configs: - targets: - 'server1.example.com:9100' - 'server2.example.com:9100' - '192.168.1.100:9100' - '192.168.1.101:9100' scrape_interval: 30s metrics_path: /metrics scrape_timeout: 10s ``` Reload Prometheus configuration: ```bash sudo systemctl reload prometheus ``` Step 5: Creating Alert Rules Alerting is crucial for proactive monitoring. Create alert rules for common Linux server issues. Create Alert Rules File ```bash sudo nano /etc/prometheus/alert_rules.yml ``` Add comprehensive alert rules: ```yaml groups: - name: linux_server_alerts rules: - alert: InstanceDown expr: up == 0 for: 5m labels: severity: critical annotations: summary: "Instance {{ $labels.instance }} down" description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes." - alert: HighCPUUsage expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80 for: 5m labels: severity: warning annotations: summary: "High CPU usage on {{ $labels.instance }}" description: "CPU usage is above 80% for more than 5 minutes on {{ $labels.instance }}" - alert: HighMemoryUsage expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 85 for: 5m labels: severity: warning annotations: summary: "High memory usage on {{ $labels.instance }}" description: "Memory usage is above 85% on {{ $labels.instance }}" - alert: DiskSpaceLow expr: (node_filesystem_avail_bytes{fstype!="tmpfs"} / node_filesystem_size_bytes{fstype!="tmpfs"}) * 100 < 10 for: 5m labels: severity: critical annotations: summary: "Low disk space on {{ $labels.instance }}" description: "Disk space is below 10% on {{ $labels.instance }} filesystem {{ $labels.mountpoint }}" - alert: HighLoadAverage expr: node_load15 > 2 for: 10m labels: severity: warning annotations: summary: "High load average on {{ $labels.instance }}" description: "Load average is {{ $value }} on {{ $labels.instance }}" ``` Set proper ownership and reload Prometheus: ```bash sudo chown prometheus:prometheus /etc/prometheus/alert_rules.yml sudo systemctl reload prometheus ``` Step 6: Installing and Configuring Alertmanager Alertmanager handles alerts sent by Prometheus and routes them to notification channels. Install Alertmanager Create user and directories: ```bash sudo useradd --no-create-home --shell /bin/false alertmanager sudo mkdir /etc/alertmanager sudo mkdir /var/lib/alertmanager sudo chown alertmanager:alertmanager /etc/alertmanager sudo chown alertmanager:alertmanager /var/lib/alertmanager ``` Download and install: ```bash cd /tmp wget https://github.com/prometheus/alertmanager/releases/download/v0.25.0/alertmanager-0.25.0.linux-amd64.tar.gz tar xvf alertmanager-0.25.0.linux-amd64.tar.gz cd alertmanager-0.25.0.linux-amd64 sudo cp alertmanager /usr/local/bin sudo cp amtool /usr/local/bin sudo chown alertmanager:alertmanager /usr/local/bin/alertmanager sudo chown alertmanager:alertmanager /usr/local/bin/amtool ``` Configure Alertmanager Create the configuration file: ```bash sudo nano /etc/alertmanager/alertmanager.yml ``` Add basic email notification configuration: ```yaml global: smtp_smarthost: 'smtp.gmail.com:587' smtp_from: 'alerts@yourcompany.com' smtp_auth_username: 'alerts@yourcompany.com' smtp_auth_password: 'your-app-password' route: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 1h receiver: 'email-notifications' receivers: - name: 'email-notifications' email_configs: - to: 'admin@yourcompany.com' subject: 'Prometheus Alert: {{ .GroupLabels.alertname }}' body: | {{ range .Alerts }} Alert: {{ .Annotations.summary }} Description: {{ .Annotations.description }} {{ end }} ``` Create Alertmanager service: ```bash sudo nano /etc/systemd/system/alertmanager.service ``` Add service configuration: ```ini [Unit] Description=Alertmanager Wants=network-online.target After=network-online.target [Service] User=alertmanager Group=alertmanager Type=simple ExecStart=/usr/local/bin/alertmanager \ --config.file /etc/alertmanager/alertmanager.yml \ --storage.path /var/lib/alertmanager/ \ --web.listen-address=0.0.0.0:9093 [Install] WantedBy=multi-user.target ``` Start Alertmanager: ```bash sudo chown -R alertmanager:alertmanager /etc/alertmanager sudo systemctl daemon-reload sudo systemctl enable alertmanager sudo systemctl start alertmanager ``` Step 7: Advanced Monitoring Configurations Service Discovery For dynamic environments, configure service discovery instead of static targets: ```yaml scrape_configs: - job_name: 'consul-services' consul_sd_configs: - server: 'localhost:8500' services: ['node-exporter'] relabel_configs: - source_labels: [__meta_consul_service] target_label: job ``` Custom Metrics Collection Configure Node Exporter to collect additional metrics: ```bash sudo nano /etc/systemd/system/node_exporter.service ``` Add custom collectors: ```ini ExecStart=/usr/local/bin/node_exporter \ --collector.systemd \ --collector.processes \ --collector.interrupts \ --collector.tcpstat \ --collector.meminfo_numa ``` Practical Examples and Use Cases Example 1: CPU Monitoring Query Monitor CPU usage across all servers: ```promql 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) ``` Example 2: Memory Usage Monitoring Track memory utilization: ```promql (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 ``` Example 3: Disk I/O Monitoring Monitor disk read/write operations: ```promql rate(node_disk_reads_completed_total[5m]) rate(node_disk_writes_completed_total[5m]) ``` Example 4: Network Traffic Monitoring Track network interface traffic: ```promql rate(node_network_receive_bytes_total{device!="lo"}[5m]) rate(node_network_transmit_bytes_total{device!="lo"}[5m]) ``` Common Issues and Troubleshooting Issue 1: Prometheus Cannot Scrape Targets Symptoms: Targets showing as "DOWN" in Prometheus web interface Solutions: - Verify Node Exporter is running: `sudo systemctl status node_exporter` - Check firewall rules allow port 9100 - Verify network connectivity: `telnet target_server 9100` - Check Prometheus logs: `sudo journalctl -u prometheus -f` Issue 2: High Memory Usage by Prometheus Symptoms: Prometheus consuming excessive memory Solutions: - Reduce retention period: `--storage.tsdb.retention.time=15d` - Increase scrape intervals for less critical metrics - Implement recording rules for complex queries - Consider federation for large deployments Issue 3: Missing Metrics Symptoms: Expected metrics not appearing in Prometheus Solutions: - Verify Node Exporter collectors are enabled - Check metric names have changed in newer versions - Ensure proper relabeling configurations - Verify target labels and job names Issue 4: Alertmanager Not Sending Notifications Symptoms: Alerts firing but notifications not received Solutions: - Verify SMTP configuration and credentials - Check Alertmanager logs: `sudo journalctl -u alertmanager -f` - Test email connectivity from server - Verify routing rules in alertmanager.yml Best Practices and Tips Performance Optimization 1. Tune Scrape Intervals: Set appropriate scrape intervals based on metric importance 2. Use Recording Rules: Pre-calculate complex queries to reduce load 3. Implement Proper Retention: Balance storage costs with data retention needs 4. Monitor Prometheus Itself: Set up monitoring for your monitoring infrastructure Security Considerations 1. Network Security: Use VPNs or private networks for metric collection 2. Authentication: Implement reverse proxy with authentication for web interfaces 3. Encryption: Use TLS for communication between components 4. Access Control: Limit access to Prometheus and Alertmanager interfaces Scalability Strategies 1. Federation: Use Prometheus federation for multi-datacenter setups 2. Sharding: Distribute monitoring load across multiple Prometheus instances 3. Remote Storage: Implement remote storage for long-term retention 4. Horizontal Scaling: Scale Alertmanager for high availability Maintenance Procedures 1. Regular Backups: Backup Prometheus configuration and critical data 2. Update Management: Keep Prometheus and exporters updated 3. Capacity Planning: Monitor storage growth and plan capacity accordingly 4. Documentation: Maintain documentation for custom configurations and procedures Monitoring Dashboard Creation While Prometheus provides a basic web interface, consider integrating with Grafana for advanced visualization: Basic Grafana Integration 1. Install Grafana on the same server or a dedicated instance 2. Add Prometheus as a data source: `http://localhost:9090` 3. Import community dashboards for Linux server monitoring 4. Create custom dashboards for specific use cases Essential Dashboard Panels - System overview (CPU, Memory, Disk, Network) - Process monitoring - Service status monitoring - Alert status and history - Custom application metrics Conclusion and Next Steps Implementing Prometheus for Linux server monitoring provides a robust foundation for maintaining system health and performance. This comprehensive setup enables you to: - Monitor critical system metrics across multiple servers - Receive timely alerts for potential issues - Analyze historical performance trends - Scale monitoring infrastructure as your environment grows Recommended Next Steps 1. Expand Monitoring Coverage: Add application-specific exporters for services like Apache, Nginx, MySQL, or PostgreSQL 2. Implement Grafana: Set up Grafana dashboards for better visualization and reporting 3. Enhance Alerting: Configure additional notification channels (Slack, PagerDuty, webhooks) 4. Automate Deployment: Use configuration management tools (Ansible, Puppet, Chef) to automate Prometheus deployment 5. Implement High Availability: Set up Prometheus in HA mode with multiple instances 6. Add Custom Metrics: Develop custom exporters for application-specific monitoring needs By following this guide and implementing these best practices, you'll have a production-ready monitoring solution that scales with your infrastructure needs. Regular maintenance, monitoring of the monitoring system itself, and continuous improvement of alert rules will ensure your Linux servers remain healthy and performant. Remember that monitoring is an iterative process. Start with basic metrics and alerts, then gradually expand coverage based on your specific requirements and operational experience. The investment in proper monitoring pays dividends in reduced downtime, faster issue resolution, and improved system reliability.