How to configure persistent storage in Kubernetes on Linux
How to Configure Persistent Storage in Kubernetes on Linux
Persistent storage is a critical component of any production Kubernetes cluster, enabling applications to maintain data beyond the lifecycle of individual pods. Unlike ephemeral storage that disappears when a pod terminates, persistent storage ensures data durability and availability across pod restarts, node failures, and cluster maintenance operations.
This comprehensive guide will walk you through the complete process of configuring persistent storage in Kubernetes on Linux systems, covering everything from basic concepts to advanced storage management techniques. You'll learn how to create Persistent Volumes (PVs), Persistent Volume Claims (PVCs), configure Storage Classes, and implement various storage backends to meet your application requirements.
Prerequisites and Requirements
Before diving into persistent storage configuration, ensure you have the following prerequisites in place:
System Requirements
- A running Kubernetes cluster (version 1.20 or later recommended)
- Linux-based worker nodes (Ubuntu 18.04+, CentOS 7+, or RHEL 7+)
- Administrative access to the cluster (kubectl with cluster-admin privileges)
- Sufficient storage resources on your nodes or external storage systems
Required Tools and Knowledge
- kubectl: Kubernetes command-line tool properly configured
- Basic Kubernetes concepts: Understanding of pods, deployments, and services
- Linux storage fundamentals: Knowledge of filesystems, mount points, and storage devices
- YAML syntax: Familiarity with Kubernetes manifest files
Storage Backend Options
Choose one or more storage backends based on your infrastructure:
- Local storage: Direct attached storage on worker nodes
- Network File System (NFS): Shared network storage
- Cloud storage: AWS EBS, Google Persistent Disk, Azure Disk
- Distributed storage: Ceph, GlusterFS, or Longhorn
- Container Storage Interface (CSI) drivers for various storage solutions
Understanding Kubernetes Storage Concepts
Persistent Volumes (PVs)
Persistent Volumes represent storage resources in your cluster, abstracting the underlying storage implementation from applications. PVs are cluster-wide resources that exist independently of any pod that uses them.
Persistent Volume Claims (PVCs)
Persistent Volume Claims are requests for storage by applications. They specify storage requirements such as size, access modes, and storage classes. PVCs bind to available PVs that meet their requirements.
Storage Classes
Storage Classes provide a way to describe different types of storage available in your cluster. They enable dynamic provisioning of storage resources and define parameters for storage creation.
Access Modes
Kubernetes supports three access modes for persistent storage:
- ReadWriteOnce (RWO): Volume can be mounted as read-write by a single node
- ReadOnlyMany (ROX): Volume can be mounted read-only by many nodes
- ReadWriteMany (RWX): Volume can be mounted as read-write by many nodes
Step-by-Step Configuration Guide
Step 1: Setting Up Local Persistent Volumes
Local persistent volumes provide high-performance storage by using locally attached disks on worker nodes. This approach is ideal for applications requiring low latency and high IOPS.
First, prepare storage directories on your worker nodes:
```bash
On each worker node, create storage directories
sudo mkdir -p /mnt/local-storage/vol1
sudo mkdir -p /mnt/local-storage/vol2
sudo mkdir -p /mnt/local-storage/vol3
Set appropriate permissions
sudo chmod 755 /mnt/local-storage/vol*
```
Create a local persistent volume manifest:
```yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: local-pv-1
labels:
type: local
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: local-storage
local:
path: /mnt/local-storage/vol1
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker-node-1
```
Apply the persistent volume configuration:
```bash
kubectl apply -f local-pv.yaml
kubectl get pv local-pv-1
```
Step 2: Creating Storage Classes
Storage Classes enable dynamic provisioning and provide templates for storage creation. Create a storage class for local storage:
```yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain
```
For dynamic provisioning with NFS, create an NFS storage class:
```yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-storage
provisioner: example.com/nfs
parameters:
server: nfs-server.example.com
path: /exported/path
readOnly: "false"
volumeBindingMode: Immediate
allowVolumeExpansion: true
reclaimPolicy: Delete
```
Apply the storage class configurations:
```bash
kubectl apply -f storage-class-local.yaml
kubectl apply -f storage-class-nfs.yaml
kubectl get storageclass
```
Step 3: Configuring NFS Persistent Storage
NFS provides shared storage accessible from multiple nodes, making it suitable for applications requiring ReadWriteMany access.
First, set up an NFS server (if not already available):
```bash
On the NFS server
sudo apt-get update
sudo apt-get install -y nfs-kernel-server
Create and configure export directory
sudo mkdir -p /nfs/data
sudo chown nobody:nogroup /nfs/data
sudo chmod 755 /nfs/data
Configure exports
echo "/nfs/data *(rw,sync,no_subtree_check,no_root_squash)" | sudo tee -a /etc/exports
Restart NFS services
sudo systemctl restart nfs-kernel-server
sudo exportfs -a
```
Install NFS client utilities on worker nodes:
```bash
On each worker node
sudo apt-get update
sudo apt-get install -y nfs-common
```
Create an NFS persistent volume:
```yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs-pv-1
labels:
type: nfs
spec:
capacity:
storage: 50Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: nfs-storage
nfs:
server: 192.168.1.100
path: /nfs/data
mountOptions:
- hard
- nfsvers=4.1
- timeo=600
- retrans=2
```
Step 4: Creating and Using Persistent Volume Claims
Persistent Volume Claims request storage resources for your applications. Create a PVC that will bind to your local persistent volume:
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: local-storage-claim
namespace: default
spec:
accessModes:
- ReadWriteOnce
storageClassName: local-storage
resources:
requests:
storage: 8Gi
selector:
matchLabels:
type: local
```
Create an NFS PVC for shared storage:
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs-storage-claim
namespace: default
spec:
accessModes:
- ReadWriteMany
storageClassName: nfs-storage
resources:
requests:
storage: 20Gi
```
Apply the PVC configurations:
```bash
kubectl apply -f local-pvc.yaml
kubectl apply -f nfs-pvc.yaml
Check PVC status
kubectl get pvc
kubectl describe pvc local-storage-claim
```
Step 5: Deploying Applications with Persistent Storage
Create a deployment that uses persistent storage:
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: database-app
labels:
app: database
spec:
replicas: 1
selector:
matchLabels:
app: database
template:
metadata:
labels:
app: database
spec:
containers:
- name: mysql
image: mysql:8.0
env:
- name: MYSQL_ROOT_PASSWORD
value: "secretpassword"
- name: MYSQL_DATABASE
value: "myapp"
ports:
- containerPort: 3306
name: mysql
volumeMounts:
- name: mysql-storage
mountPath: /var/lib/mysql
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
volumes:
- name: mysql-storage
persistentVolumeClaim:
claimName: local-storage-claim
```
Deploy the application:
```bash
kubectl apply -f database-deployment.yaml
kubectl get pods
kubectl logs deployment/database-app
```
Advanced Storage Configuration
Dynamic Provisioning with CSI Drivers
Container Storage Interface (CSI) drivers provide a standardized way to integrate external storage systems with Kubernetes. Here's how to configure a CSI driver for Longhorn distributed storage:
```bash
Install Longhorn using kubectl
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.4.0/deploy/longhorn.yaml
Wait for Longhorn to be ready
kubectl get pods -n longhorn-system
```
Create a Longhorn storage class:
```yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: longhorn-storage
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "2880"
fromBackup: ""
fsType: "ext4"
```
Volume Snapshots and Backups
Configure volume snapshots for backup and recovery:
```yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: longhorn-snapshot-class
driver: driver.longhorn.io
deletionPolicy: Delete
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: database-snapshot
spec:
volumeSnapshotClassName: longhorn-snapshot-class
source:
persistentVolumeClaimName: local-storage-claim
```
Storage Monitoring and Metrics
Set up monitoring for storage resources:
```bash
Install metrics-server if not present
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Check storage usage
kubectl top nodes
kubectl top pods --containers
```
Practical Examples and Use Cases
Example 1: WordPress with MySQL Backend
This example demonstrates a complete WordPress deployment with persistent storage for both the application and database:
```yaml
MySQL PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: local-storage
resources:
requests:
storage: 20Gi
---
WordPress PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: wordpress-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: local-storage
resources:
requests:
storage: 10Gi
---
MySQL Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: mysql
spec:
replicas: 1
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:8.0
env:
- name: MYSQL_ROOT_PASSWORD
value: "rootpassword"
- name: MYSQL_DATABASE
value: "wordpress"
- name: MYSQL_USER
value: "wpuser"
- name: MYSQL_PASSWORD
value: "wppassword"
ports:
- containerPort: 3306
volumeMounts:
- name: mysql-storage
mountPath: /var/lib/mysql
volumes:
- name: mysql-storage
persistentVolumeClaim:
claimName: mysql-pvc
---
WordPress Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: wordpress
spec:
replicas: 2
selector:
matchLabels:
app: wordpress
template:
metadata:
labels:
app: wordpress
spec:
containers:
- name: wordpress
image: wordpress:latest
env:
- name: WORDPRESS_DB_HOST
value: "mysql:3306"
- name: WORDPRESS_DB_NAME
value: "wordpress"
- name: WORDPRESS_DB_USER
value: "wpuser"
- name: WORDPRESS_DB_PASSWORD
value: "wppassword"
ports:
- containerPort: 80
volumeMounts:
- name: wordpress-storage
mountPath: /var/www/html
volumes:
- name: wordpress-storage
persistentVolumeClaim:
claimName: wordpress-pvc
```
Example 2: Shared Storage for Multi-Pod Applications
Configure shared NFS storage for applications that need to share files:
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: file-processor
spec:
replicas: 3
selector:
matchLabels:
app: file-processor
template:
metadata:
labels:
app: file-processor
spec:
containers:
- name: processor
image: alpine:latest
command: ["/bin/sh"]
args: ["-c", "while true; do echo 'Processing files...' >> /shared/logs/processor-$(hostname).log; sleep 30; done"]
volumeMounts:
- name: shared-storage
mountPath: /shared
volumes:
- name: shared-storage
persistentVolumeClaim:
claimName: nfs-storage-claim
```
Common Issues and Troubleshooting
Issue 1: PVC Stuck in Pending State
When a PVC remains in the "Pending" state, it typically indicates that no suitable PV is available or there are binding issues.
Diagnostic steps:
```bash
Check PVC status and events
kubectl describe pvc
kubectl get events --field-selector involvedObject.name=
Check available PVs
kubectl get pv
kubectl describe pv
Verify storage class configuration
kubectl get storageclass
kubectl describe storageclass
```
Common solutions:
1. Insufficient storage: Ensure PVs have adequate capacity
2. Access mode mismatch: Verify PVC and PV access modes are compatible
3. Storage class issues: Check storage class provisioner and parameters
4. Node affinity: For local storage, ensure pods can be scheduled on nodes with available storage
Issue 2: Pod Cannot Mount Volume
Pods may fail to start due to volume mounting issues.
Diagnostic commands:
```bash
Check pod status and events
kubectl describe pod
kubectl logs
Verify volume attachment
kubectl get volumeattachment
kubectl describe volumeattachment
Check node storage status
kubectl describe node
```
Resolution steps:
1. Check file permissions: Ensure the container has appropriate permissions
2. Verify mount paths: Confirm mount paths exist and are accessible
3. Storage backend health: Verify NFS servers or other storage backends are operational
4. CSI driver status: For CSI volumes, check driver pod logs
Issue 3: Storage Performance Problems
Poor storage performance can significantly impact application performance.
Performance monitoring:
```bash
Monitor I/O statistics on nodes
iostat -x 1 5
iotop -o
Check for storage bottlenecks
kubectl top nodes
kubectl top pods --containers
Analyze storage metrics
kubectl get --raw /metrics | grep storage
```
Optimization strategies:
1. Choose appropriate storage types: Use local SSDs for high-performance requirements
2. Configure proper I/O schedulers: Optimize Linux I/O schedulers for your workload
3. Implement storage tiering: Use fast storage for active data, slower storage for archives
4. Monitor and alert: Set up comprehensive storage monitoring
Issue 4: Data Loss and Recovery
Implement proper backup and recovery strategies to prevent data loss.
Backup best practices:
```bash
Create volume snapshots
kubectl apply -f volume-snapshot.yaml
Verify snapshot creation
kubectl get volumesnapshot
kubectl describe volumesnapshot
Test restore procedures
kubectl apply -f restore-from-snapshot.yaml
```
Best Practices and Professional Tips
Security Considerations
1. Implement proper RBAC: Restrict access to storage resources using Role-Based Access Control
2. Encrypt data at rest: Use storage encryption for sensitive data
3. Network security: Secure NFS and other network storage protocols
4. Regular security audits: Review storage configurations and access patterns
```yaml
Example RBAC for storage management
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: storage-admin
rules:
- apiGroups: [""]
resources: ["persistentvolumes", "persistentvolumeclaims"]
verbs: ["get", "list", "create", "update", "patch", "delete"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "create", "update", "patch", "delete"]
```
Performance Optimization
1. Choose the right storage type: Match storage characteristics to application requirements
2. Optimize filesystem parameters: Tune filesystem settings for your workload
3. Implement storage monitoring: Use tools like Prometheus and Grafana for storage metrics
4. Regular maintenance: Perform routine storage maintenance and cleanup
Capacity Planning
1. Monitor storage usage trends: Track growth patterns and plan accordingly
2. Implement storage quotas: Use resource quotas to prevent storage exhaustion
3. Automate storage provisioning: Use dynamic provisioning for scalable storage management
4. Plan for disaster recovery: Implement cross-region replication for critical data
Operational Excellence
1. Document storage architecture: Maintain comprehensive documentation of storage configurations
2. Implement change management: Use GitOps practices for storage configuration changes
3. Regular testing: Test backup and recovery procedures regularly
4. Training and knowledge sharing: Ensure team members understand storage operations
Conclusion and Next Steps
Configuring persistent storage in Kubernetes on Linux requires careful planning and understanding of various storage options and their trade-offs. This comprehensive guide has covered the essential aspects of Kubernetes storage, from basic concepts to advanced configurations and troubleshooting.
Key takeaways from this guide include:
- Understanding the relationship between Persistent Volumes, Persistent Volume Claims, and Storage Classes
- Implementing various storage backends including local storage, NFS, and CSI drivers
- Following best practices for security, performance, and operational excellence
- Troubleshooting common storage issues and implementing monitoring solutions
Recommended Next Steps
1. Evaluate your storage requirements: Assess your application needs for performance, capacity, and availability
2. Implement monitoring and alerting: Set up comprehensive storage monitoring using tools like Prometheus
3. Develop backup and recovery procedures: Create and test disaster recovery plans
4. Explore advanced storage features: Investigate volume snapshots, cloning, and storage tiering
5. Consider managed storage solutions: Evaluate cloud-native storage options for simplified management
Additional Resources
- Kubernetes Documentation: Official storage documentation and examples
- CSI Driver Registry: Comprehensive list of available CSI drivers
- Storage Vendor Documentation: Specific configuration guides for your storage systems
- Community Forums: Kubernetes community discussions and troubleshooting help
By following the practices and techniques outlined in this guide, you'll be well-equipped to implement robust, scalable, and reliable persistent storage solutions for your Kubernetes applications on Linux platforms. Remember that storage configuration is an iterative process, and continuous monitoring and optimization will help ensure optimal performance and reliability for your applications.