How to configure Elasticsearch cluster in Linux

How to Configure Elasticsearch Cluster in Linux Elasticsearch is a powerful, distributed search and analytics engine that forms the backbone of many modern applications. Setting up an Elasticsearch cluster in Linux provides high availability, fault tolerance, and improved performance through distributed computing. This comprehensive guide will walk you through the entire process of configuring a production-ready Elasticsearch cluster on Linux systems. Table of Contents 1. [Introduction](#introduction) 2. [Prerequisites and Requirements](#prerequisites-and-requirements) 3. [System Preparation](#system-preparation) 4. [Installing Elasticsearch](#installing-elasticsearch) 5. [Cluster Configuration](#cluster-configuration) 6. [Node Configuration](#node-configuration) 7. [Security Configuration](#security-configuration) 8. [Starting and Managing the Cluster](#starting-and-managing-the-cluster) 9. [Monitoring and Health Checks](#monitoring-and-health-checks) 10. [Troubleshooting Common Issues](#troubleshooting-common-issues) 11. [Best Practices](#best-practices) 12. [Conclusion](#conclusion) Introduction An Elasticsearch cluster consists of multiple nodes working together to store, index, and search data. Unlike a single-node setup, a cluster provides redundancy, scalability, and improved performance. Each node in the cluster can serve different roles: master-eligible nodes manage cluster state, data nodes store and process data, and coordinating nodes handle client requests. This guide covers setting up a multi-node Elasticsearch cluster with proper security, monitoring, and optimization configurations suitable for production environments. Prerequisites and Requirements System Requirements Before beginning the installation, ensure your Linux systems meet the following requirements: Hardware Requirements: - RAM: Minimum 8GB per node (16GB+ recommended for production) - CPU: Multi-core processor (4+ cores recommended) - Storage: SSD storage recommended for better I/O performance - Network: Reliable network connectivity between nodes Software Requirements: - Operating System: Ubuntu 18.04+, CentOS 7+, RHEL 7+, or similar Linux distribution - Java: OpenJDK 11 or Oracle JDK 11 (Elasticsearch 7.x and later includes bundled JDK) - Root or sudo access on all cluster nodes Network Configuration Ensure the following network requirements are met: - All nodes can communicate with each other on ports 9200 (HTTP) and 9300 (transport) - Firewall rules allow traffic between cluster nodes - Each node has a static IP address or reliable hostname resolution - Network latency between nodes should be minimal (preferably < 1ms) System Preparation Step 1: Update System Packages On each node, update the system packages: ```bash Ubuntu/Debian sudo apt update && sudo apt upgrade -y CentOS/RHEL sudo yum update -y or for newer versions sudo dnf update -y ``` Step 2: Configure System Limits Elasticsearch requires specific system limits to function properly. Edit the limits configuration: ```bash sudo vim /etc/security/limits.conf ``` Add the following lines: ```bash elasticsearch soft nofile 65536 elasticsearch hard nofile 65536 elasticsearch soft nproc 4096 elasticsearch hard nproc 4096 elasticsearch soft memlock unlimited elasticsearch hard memlock unlimited ``` Step 3: Configure Virtual Memory Set the virtual memory map count: ```bash sudo sysctl -w vm.max_map_count=262144 ``` Make this setting permanent: ```bash echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf ``` Step 4: Disable Swap Disable swap to prevent performance issues: ```bash sudo swapoff -a ``` Comment out swap entries in `/etc/fstab`: ```bash sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab ``` Installing Elasticsearch Method 1: Using Package Repository (Recommended) Import the Elasticsearch GPG key: ```bash wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add - ``` Add the repository: ```bash Ubuntu/Debian echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list CentOS/RHEL sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch ``` For CentOS/RHEL, create the repository file: ```bash sudo tee /etc/yum.repos.d/elasticsearch.repo <Master-eligible nodes: Manage cluster state (minimum 3 for high availability) - Data nodes: Store and process data - Coordinating nodes: Handle client requests and distribute queries - Ingest nodes: Pre-process documents before indexing Step 1: Configure Cluster Discovery Create or edit the main configuration file `/etc/elasticsearch/elasticsearch.yml`: ```yaml Cluster configuration cluster.name: production-cluster node.name: node-1 Network configuration network.host: 0.0.0.0 http.port: 9200 transport.port: 9300 Discovery configuration discovery.seed_hosts: - "192.168.1.10:9300" - "192.168.1.11:9300" - "192.168.1.12:9300" cluster.initial_master_nodes: - "node-1" - "node-2" - "node-3" Path configuration path.data: /var/lib/elasticsearch path.logs: /var/log/elasticsearch Memory configuration bootstrap.memory_lock: true Security configuration (for Elasticsearch 8.x) xpack.security.enabled: true xpack.security.enrollment.enabled: true xpack.security.http.ssl: enabled: true keystore.path: certs/http.p12 xpack.security.transport.ssl: enabled: true verification_mode: certificate keystore.path: certs/transport.p12 truststore.path: certs/transport.p12 ``` Step 2: Configure JVM Settings Edit the JVM options file `/etc/elasticsearch/jvm.options`: ```bash Heap size (set to 50% of available RAM, max 32GB) -Xms4g -Xmx4g GC configuration -XX:+UseG1GC -XX:G1HeapRegionSize=16m -XX:+UseG1GC -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:MaxGCPauseMillis=50 Memory mapping -Dfile.encoding=UTF-8 -Djava.io.tmpdir=${ES_TMPDIR} ``` Node Configuration Master-Eligible Node Configuration For dedicated master nodes, configure as follows: ```yaml Node roles node.roles: [ master ] Node identification node.name: master-node-1 cluster.name: production-cluster Disable data storage on master nodes node.data: false node.ingest: false Network settings network.host: 192.168.1.10 http.port: 9200 transport.port: 9300 Discovery settings discovery.seed_hosts: - "192.168.1.10:9300" - "192.168.1.11:9300" - "192.168.1.12:9300" cluster.initial_master_nodes: - "master-node-1" - "master-node-2" - "master-node-3" ``` Data Node Configuration For dedicated data nodes: ```yaml Node roles node.roles: [ data, data_content, data_hot, data_warm, data_cold ] Node identification node.name: data-node-1 cluster.name: production-cluster Enable data storage node.data: true node.master: false Storage paths path.data: ["/data1/elasticsearch", "/data2/elasticsearch"] Network settings network.host: 192.168.1.20 http.port: 9200 transport.port: 9300 Discovery settings discovery.seed_hosts: - "192.168.1.10:9300" - "192.168.1.11:9300" - "192.168.1.12:9300" ``` Coordinating Node Configuration For dedicated coordinating nodes: ```yaml Node roles node.roles: [] Node identification node.name: coordinating-node-1 cluster.name: production-cluster Disable data and master roles node.data: false node.master: false node.ingest: false Network settings network.host: 192.168.1.30 http.port: 9200 transport.port: 9300 ``` Security Configuration Step 1: Enable X-Pack Security For Elasticsearch 8.x, security is enabled by default. For older versions, enable it manually: ```yaml xpack.security.enabled: true xpack.security.transport.ssl.enabled: true xpack.security.http.ssl.enabled: true ``` Step 2: Generate Certificates Generate certificates for secure communication: ```bash Generate CA certificate sudo /usr/share/elasticsearch/bin/elasticsearch-certutil ca --out /etc/elasticsearch/certs/elastic-stack-ca.p12 --pass "" Generate node certificates sudo /usr/share/elasticsearch/bin/elasticsearch-certutil cert --ca /etc/elasticsearch/certs/elastic-stack-ca.p12 --out /etc/elasticsearch/certs/elastic-certificates.p12 --pass "" Set proper permissions sudo chown elasticsearch:elasticsearch /etc/elasticsearch/certs/* sudo chmod 660 /etc/elasticsearch/certs/* ``` Step 3: Configure SSL/TLS Update the configuration file: ```yaml xpack.security.transport.ssl: enabled: true verification_mode: certificate keystore.path: certs/elastic-certificates.p12 truststore.path: certs/elastic-certificates.p12 xpack.security.http.ssl: enabled: true keystore.path: certs/elastic-certificates.p12 ``` Step 4: Set Up Authentication Generate passwords for built-in users: ```bash sudo /usr/share/elasticsearch/bin/elasticsearch-setup-passwords auto ``` Save the generated passwords securely. You can also set passwords interactively: ```bash sudo /usr/share/elasticsearch/bin/elasticsearch-setup-passwords interactive ``` Starting and Managing the Cluster Step 1: Enable and Start Elasticsearch Service On each node, enable and start the Elasticsearch service: ```bash Enable service to start on boot sudo systemctl enable elasticsearch Start the service sudo systemctl start elasticsearch Check service status sudo systemctl status elasticsearch ``` Step 2: Verify Cluster Formation Check if nodes have joined the cluster: ```bash Check cluster health curl -X GET "localhost:9200/_cluster/health?pretty" List cluster nodes curl -X GET "localhost:9200/_cat/nodes?v" Check cluster state curl -X GET "localhost:9200/_cluster/state?pretty" ``` If security is enabled, use authentication: ```bash curl -u elastic:password -X GET "https://localhost:9200/_cluster/health?pretty" -k ``` Step 3: Configure Service Management Create a systemd service file if using manual installation: ```bash sudo tee /etc/systemd/system/elasticsearch.service <Symptoms: Node starts but doesn't appear in cluster node list. Solutions: 1. Check network connectivity: ```bash telnet 9300 ``` 2. Verify discovery configuration: ```yaml discovery.seed_hosts: - "correct-ip:9300" ``` 3. Check firewall rules: ```bash sudo ufw allow 9200 sudo ufw allow 9300 ``` Issue 2: Split-Brain Prevention Symptoms: Multiple master nodes elected simultaneously. Solution: Configure minimum master nodes properly: ```yaml For 3 master-eligible nodes discovery.zen.minimum_master_nodes: 2 For 5 master-eligible nodes discovery.zen.minimum_master_nodes: 3 ``` Issue 3: Memory Issues Symptoms: OutOfMemoryError or high GC pressure. Solutions: 1. Adjust heap size: ```bash In jvm.options -Xms8g -Xmx8g ``` 2. Enable memory lock: ```yaml bootstrap.memory_lock: true ``` 3. Monitor field data usage: ```bash curl -X GET "localhost:9200/_nodes/stats/indices/fielddata?pretty" ``` Issue 4: Disk Space Issues Symptoms: Cluster goes to read-only mode. Solutions: 1. Check disk watermarks: ```yaml cluster.routing.allocation.disk.watermark.low: 85% cluster.routing.allocation.disk.watermark.high: 90% cluster.routing.allocation.disk.watermark.flood_stage: 95% ``` 2. Clean up old indices: ```bash Delete old indices curl -X DELETE "localhost:9200/old-index-*" Use Index Lifecycle Management (ILM) curl -X PUT "localhost:9200/_ilm/policy/cleanup-policy" -H 'Content-Type: application/json' -d' { "policy": { "phases": { "delete": { "min_age": "30d", "actions": { "delete": {} } } } } }' ``` Issue 5: SSL/TLS Certificate Problems Symptoms: SSL handshake failures or certificate errors. Solutions: 1. Regenerate certificates: ```bash sudo /usr/share/elasticsearch/bin/elasticsearch-certutil cert --ca elastic-stack-ca.p12 --out elastic-certificates.p12 ``` 2. Verify certificate configuration: ```yaml xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12 xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12 ``` 3. Check certificate permissions: ```bash sudo chown elasticsearch:elasticsearch /etc/elasticsearch/certs/* sudo chmod 660 /etc/elasticsearch/certs/* ``` Best Practices Hardware and Infrastructure 1. Use SSD storage for better I/O performance 2. Separate master and data nodes in large clusters 3. Use dedicated coordinating nodes for heavy query loads 4. Implement proper network segmentation for security 5. Use load balancers for client connections Configuration Optimization 1. Set appropriate heap sizes (50% of RAM, max 32GB) 2. Configure proper thread pools: ```yaml thread_pool: search: size: 30 queue_size: 1000 write: size: 30 queue_size: 200 ``` 3. Optimize index settings: ```json { "settings": { "number_of_shards": 1, "number_of_replicas": 1, "refresh_interval": "30s", "index.translog.durability": "async" } } ``` Security Best Practices 1. Enable X-Pack Security with proper authentication 2. Use TLS/SSL for all communications 3. Implement role-based access control (RBAC) 4. Regular security updates and patches 5. Network security with firewalls and VPNs Monitoring and Maintenance 1. Implement comprehensive monitoring with tools like Metricbeat 2. Set up alerting for critical metrics 3. Regular backups using snapshot repositories 4. Index lifecycle management for automated cleanup 5. Performance testing and capacity planning Backup Strategy Implement automated backups: ```bash Create snapshot repository curl -X PUT "localhost:9200/_snapshot/backup_repository" -H 'Content-Type: application/json' -d' { "type": "fs", "settings": { "location": "/backup/elasticsearch" } }' Create snapshot curl -X PUT "localhost:9200/_snapshot/backup_repository/snapshot_1" ``` Conclusion Setting up an Elasticsearch cluster in Linux requires careful planning and attention to detail. This comprehensive guide has covered all aspects of cluster configuration, from initial system preparation to advanced security and monitoring setups. Key takeaways: 1. Proper planning is essential for cluster architecture and node roles 2. Security configuration should be implemented from the beginning 3. Monitoring and maintenance are crucial for production stability 4. Performance optimization requires ongoing tuning and adjustment 5. Backup strategies ensure data protection and disaster recovery Next Steps After successfully setting up your Elasticsearch cluster: 1. Implement monitoring solutions like Kibana and Metricbeat 2. Set up index templates and lifecycle policies 3. Configure client applications to use the cluster 4. Plan for scaling as data and query volumes grow 5. Establish operational procedures for maintenance and troubleshooting With proper configuration and maintenance, your Elasticsearch cluster will provide reliable, scalable search and analytics capabilities for your applications. Regular monitoring, updates, and optimization will ensure optimal performance and stability in production environments. Remember to stay updated with the latest Elasticsearch releases and security patches, and always test configuration changes in a development environment before applying them to production systems.