How to automate complex workflows with shell scripts
How to Automate Complex Workflows with Shell Scripts
Table of Contents
1. [Introduction](#introduction)
2. [Prerequisites](#prerequisites)
3. [Understanding Complex Workflow Automation](#understanding-complex-workflow-automation)
4. [Essential Shell Scripting Concepts](#essential-shell-scripting-concepts)
5. [Building Modular Script Architecture](#building-modular-script-architecture)
6. [Advanced Flow Control and Logic](#advanced-flow-control-and-logic)
7. [Error Handling and Recovery](#error-handling-and-recovery)
8. [Parallel Processing and Job Management](#parallel-processing-and-job-management)
9. [Real-World Workflow Examples](#real-world-workflow-examples)
10. [Integration with External Systems](#integration-with-external-systems)
11. [Monitoring and Logging](#monitoring-and-logging)
12. [Testing and Debugging](#testing-and-debugging)
13. [Best Practices](#best-practices)
14. [Common Issues and Troubleshooting](#common-issues-and-troubleshooting)
15. [Conclusion](#conclusion)
Introduction
Shell script automation has evolved from simple command sequences to sophisticated workflow orchestration systems capable of managing complex enterprise processes. This comprehensive guide explores advanced techniques for creating robust, maintainable, and scalable shell scripts that can handle intricate workflows involving multiple systems, conditional logic, error recovery, and parallel processing.
Modern workflow automation requires scripts that can adapt to changing conditions, handle failures gracefully, and provide comprehensive monitoring and reporting. Whether you're managing data pipelines, deployment processes, system maintenance tasks, or integration workflows, mastering advanced shell scripting techniques will significantly enhance your automation capabilities.
Prerequisites
Before diving into complex workflow automation, ensure you have:
Technical Requirements
- Operating System: Linux, macOS, or Unix-like environment
- Shell: Bash 4.0+ (recommended) or compatible shell
- Text Editor: vi/vim, nano, or your preferred editor
- Basic Tools: grep, sed, awk, curl, jq (for JSON processing)
Knowledge Requirements
- Basic Shell Scripting: Variables, loops, conditionals
- Command Line Proficiency: File operations, process management
- System Administration: Understanding of processes, permissions, networking
- Regular Expressions: Pattern matching and text processing
Environment Setup
```bash
Verify bash version
bash --version
Install essential tools (Ubuntu/Debian)
sudo apt-get update
sudo apt-get install jq curl wget git
Install essential tools (CentOS/RHEL)
sudo yum install jq curl wget git
Create workspace directory
mkdir -p ~/workflow-automation
cd ~/workflow-automation
```
Understanding Complex Workflow Automation
Complex workflows typically involve multiple interconnected tasks that must execute in specific sequences, handle various conditions, and recover from failures. Understanding the components of such workflows is crucial for effective automation.
Workflow Characteristics
Sequential Dependencies: Tasks that must complete before others can begin
Parallel Processing: Independent tasks that can run simultaneously
Conditional Logic: Decision points based on data or system states
Error Recovery: Mechanisms to handle and recover from failures
Resource Management: Efficient use of system resources and external services
Planning Your Workflow
Before writing code, create a workflow diagram that identifies:
1. Entry Points: How the workflow initiates
2. Task Dependencies: Which tasks depend on others
3. Decision Points: Where conditional logic is needed
4. Failure Scenarios: What can go wrong and how to handle it
5. Exit Conditions: How the workflow completes successfully
Essential Shell Scripting Concepts
Advanced Variable Handling
```bash
#!/bin/bash
Configuration management with associative arrays
declare -A CONFIG
CONFIG[database_host]="localhost"
CONFIG[database_port]="5432"
CONFIG[max_retries]="3"
CONFIG[timeout]="30"
Function to load configuration from file
load_config() {
local config_file="$1"
if [[ -f "$config_file" ]]; then
while IFS='=' read -r key value; do
# Skip comments and empty lines
[[ $key =~ ^[[:space:]]*# ]] && continue
[[ -z "$key" ]] && continue
# Remove quotes and whitespace
key=$(echo "$key" | sed 's/^[[:space:]]//;s/[[:space:]]$//')
value=$(echo "$value" | sed 's/^[[:space:]]//;s/[[:space:]]$//;s/^"//;s/"$//')
CONFIG["$key"]="$value"
done < "$config_file"
fi
}
Environment-aware configuration
set_environment() {
local env="${1:-development}"
case "$env" in
"production")
CONFIG[log_level]="ERROR"
CONFIG[debug_mode]="false"
;;
"staging")
CONFIG[log_level]="WARN"
CONFIG[debug_mode]="false"
;;
"development")
CONFIG[log_level]="DEBUG"
CONFIG[debug_mode]="true"
;;
esac
}
```
Function Libraries and Modularity
```bash
#!/bin/bash
logging.sh - Logging utility functions
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
LOG_FILE="${SCRIPT_DIR}/workflow.log"
LOG_LEVEL="${LOG_LEVEL:-INFO}"
Logging function with levels
log() {
local level="$1"
shift
local message="$*"
local timestamp=$(date '+%Y-%m-%d %H:%M:%S')
# Define log level hierarchy
declare -A levels=([DEBUG]=0 [INFO]=1 [WARN]=2 [ERROR]=3)
local current_level=${levels[$LOG_LEVEL]:-1}
local msg_level=${levels[$level]:-1}
# Only log if message level is >= current level
if (( msg_level >= current_level )); then
echo "[$timestamp] [$level] $message" | tee -a "$LOG_FILE"
fi
}
Specialized logging functions
log_debug() { log "DEBUG" "$@"; }
log_info() { log "INFO" "$@"; }
log_warn() { log "WARN" "$@"; }
log_error() { log "ERROR" "$@"; }
Progress tracking
show_progress() {
local current="$1"
local total="$2"
local description="$3"
local percent=$((current * 100 / total))
local bar_length=50
local filled_length=$((percent * bar_length / 100))
printf "\r["
printf "%${filled_length}s" | tr ' ' '='
printf "%$((bar_length - filled_length))s" | tr ' ' '-'
printf "] %d%% %s" "$percent" "$description"
if (( current == total )); then
echo ""
fi
}
```
Building Modular Script Architecture
Directory Structure
```
workflow-automation/
├── bin/ # Main executable scripts
│ ├── main-workflow.sh
│ └── sub-workflow.sh
├── lib/ # Function libraries
│ ├── logging.sh
│ ├── database.sh
│ ├── api-client.sh
│ └── file-operations.sh
├── config/ # Configuration files
│ ├── default.conf
│ ├── production.conf
│ └── development.conf
├── templates/ # Template files
│ ├── email-template.html
│ └── report-template.sql
├── data/ # Data files and temporary storage
│ ├── input/
│ ├── output/
│ └── temp/
└── logs/ # Log files
├── workflow.log
└── error.log
```
Main Workflow Controller
```bash
#!/bin/bash
bin/main-workflow.sh - Main workflow controller
set -euo pipefail # Exit on error, undefined vars, pipe failures
Get script directory and set up paths
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(dirname "$SCRIPT_DIR")"
LIB_DIR="$PROJECT_ROOT/lib"
CONFIG_DIR="$PROJECT_ROOT/config"
DATA_DIR="$PROJECT_ROOT/data"
LOG_DIR="$PROJECT_ROOT/logs"
Create directories if they don't exist
mkdir -p "$DATA_DIR"/{input,output,temp} "$LOG_DIR"
Source library functions
source "$LIB_DIR/logging.sh"
source "$LIB_DIR/database.sh"
source "$LIB_DIR/api-client.sh"
source "$LIB_DIR/file-operations.sh"
Global variables
WORKFLOW_ID=""
START_TIME=""
ENVIRONMENT="${ENVIRONMENT:-development}"
DRY_RUN="${DRY_RUN:-false}"
Initialize workflow
initialize_workflow() {
START_TIME=$(date '+%Y-%m-%d_%H-%M-%S')
WORKFLOW_ID="workflow_${START_TIME}_$$"
log_info "Starting workflow: $WORKFLOW_ID"
log_info "Environment: $ENVIRONMENT"
log_info "Dry run mode: $DRY_RUN"
# Load configuration
load_config "$CONFIG_DIR/default.conf"
load_config "$CONFIG_DIR/$ENVIRONMENT.conf"
# Set up signal handlers
trap cleanup_workflow EXIT
trap 'log_error "Workflow interrupted"; exit 130' INT TERM
}
Cleanup function
cleanup_workflow() {
local exit_code=$?
local end_time=$(date '+%Y-%m-%d %H:%M:%S')
if (( exit_code == 0 )); then
log_info "Workflow completed successfully: $WORKFLOW_ID"
else
log_error "Workflow failed with exit code: $exit_code"
fi
log_info "End time: $end_time"
# Cleanup temporary files
if [[ -d "$DATA_DIR/temp" ]]; then
find "$DATA_DIR/temp" -name "${WORKFLOW_ID}" -type f -delete 2>/dev/null || true
fi
}
```
Advanced Flow Control and Logic
State Machine Implementation
```bash
#!/bin/bash
State machine for complex workflow control
declare -A WORKFLOW_STATES
declare -A STATE_TRANSITIONS
CURRENT_STATE="INIT"
Define workflow states
WORKFLOW_STATES=(
[INIT]="initialize_system"
[VALIDATE]="validate_inputs"
[PROCESS]="process_data"
[TRANSFORM]="transform_results"
[DEPLOY]="deploy_changes"
[VERIFY]="verify_deployment"
[COMPLETE]="finalize_workflow"
[ERROR]="handle_error"
[ROLLBACK]="rollback_changes"
)
Define valid state transitions
STATE_TRANSITIONS=(
[INIT]="VALIDATE ERROR"
[VALIDATE]="PROCESS ERROR"
[PROCESS]="TRANSFORM ERROR"
[TRANSFORM]="DEPLOY ERROR"
[DEPLOY]="VERIFY ERROR ROLLBACK"
[VERIFY]="COMPLETE ERROR ROLLBACK"
[COMPLETE]=""
[ERROR]="ROLLBACK"
[ROLLBACK]="ERROR"
)
Execute state machine
execute_state_machine() {
local max_iterations=20
local iteration=0
while [[ "$CURRENT_STATE" != "COMPLETE" && "$CURRENT_STATE" != "ERROR" ]]; do
((iteration++))
if (( iteration > max_iterations )); then
log_error "State machine exceeded maximum iterations"
CURRENT_STATE="ERROR"
break
fi
log_info "Executing state: $CURRENT_STATE (iteration $iteration)"
# Get function for current state
local state_function="${WORKFLOW_STATES[$CURRENT_STATE]}"
if [[ -z "$state_function" ]]; then
log_error "No function defined for state: $CURRENT_STATE"
CURRENT_STATE="ERROR"
continue
fi
# Execute state function
if $state_function; then
log_info "State $CURRENT_STATE completed successfully"
else
log_error "State $CURRENT_STATE failed"
CURRENT_STATE="ERROR"
fi
done
log_info "State machine finished in state: $CURRENT_STATE"
}
```
Error Handling and Recovery
Comprehensive Error Management
```bash
#!/bin/bash
Advanced error handling and recovery mechanisms
Error severity levels
declare -A ERROR_LEVELS=(
[CRITICAL]=3
[HIGH]=2
[MEDIUM]=1
[LOW]=0
)
Enhanced error handling function
handle_error() {
local error_code="$1"
local error_message="$2"
local operation="$3"
local severity="${4:-MEDIUM}"
local recovery_strategy="${5:-RETRY}"
log_error "Error in operation '$operation': $error_message (Code: $error_code)"
# Record error details
local error_record=$(cat <> "$LOG_DIR/error-details.json"
# Execute recovery strategy
case "$recovery_strategy" in
"RETRY")
retry_with_backoff "$operation" 3
;;
"ROLLBACK")
execute_rollback_procedure "$operation"
;;
"SKIP")
log_warn "Skipping operation '$operation' due to error"
;;
"MANUAL")
request_manual_intervention "$operation" "$error_message"
;;
esac
}
Retry mechanism with exponential backoff
retry_with_backoff() {
local operation="$1"
local max_attempts="${2:-3}"
local base_delay="${3:-1}"
local max_delay="${4:-60}"
local attempt=1
local delay="$base_delay"
while (( attempt <= max_attempts )); do
log_info "Attempting operation '$operation' (attempt $attempt/$max_attempts)"
if eval "$operation"; then
log_info "Operation '$operation' succeeded on attempt $attempt"
return 0
fi
if (( attempt < max_attempts )); then
log_warn "Operation '$operation' failed, retrying in ${delay}s"
sleep "$delay"
# Exponential backoff with jitter
delay=$(( delay * 2 + RANDOM % 5 ))
if (( delay > max_delay )); then
delay="$max_delay"
fi
fi
((attempt++))
done
log_error "Operation '$operation' failed after $max_attempts attempts"
return 1
}
```
Parallel Processing and Job Management
Advanced Parallel Execution
```bash
#!/bin/bash
Parallel processing and job management
Job management system
declare -A ACTIVE_JOBS
declare -A JOB_STATUS
declare -A JOB_RESULTS
MAX_CONCURRENT_JOBS="${MAX_CONCURRENT_JOBS:-4}"
Start background job with monitoring
start_background_job() {
local job_name="$1"
local job_command="$2"
local job_timeout="${3:-300}" # 5 minutes default
# Check if we've reached the maximum concurrent jobs
while (( ${#ACTIVE_JOBS[@]} >= MAX_CONCURRENT_JOBS )); do
log_debug "Maximum concurrent jobs reached, waiting..."
sleep 2
cleanup_completed_jobs
done
log_info "Starting background job: $job_name"
# Start job in background with timeout
timeout "$job_timeout" bash -c "$job_command" &
local job_pid=$!
# Register job
ACTIVE_JOBS["$job_name"]="$job_pid"
JOB_STATUS["$job_name"]="RUNNING"
log_info "Job started: $job_name (PID: $job_pid)"
return 0
}
Parallel data processing example
process_files_parallel() {
local input_dir="$1"
local output_dir="$2"
local processor_function="$3"
mkdir -p "$output_dir"
local file_count=0
local files=()
# Collect files to process
while IFS= read -r -d '' file; do
files+=("$file")
((file_count++))
done < <(find "$input_dir" -name "*.csv" -print0)
if (( file_count == 0 )); then
log_warn "No files found in $input_dir"
return 0
fi
log_info "Processing $file_count files in parallel"
# Process files in batches
local processed=0
for file in "${files[@]}"; do
local job_name="process_$(basename "$file")"
local output_file="$output_dir/$(basename "$file" .csv)_processed.csv"
start_background_job "$job_name" "$processor_function '$file' '$output_file'"
((processed++))
show_progress "$processed" "$file_count" "Starting jobs"
done
# Wait for all jobs to complete
wait_for_all_jobs
log_info "All file processing jobs completed"
}
```
Real-World Workflow Examples
Data Pipeline Workflow
```bash
#!/bin/bash
Complete data pipeline workflow example
data_pipeline_workflow() {
local source_system="$1"
local target_system="$2"
local date_range="$3"
log_info "Starting data pipeline: $source_system -> $target_system"
# Step 1: Extract data
if ! extract_data_from_source "$source_system" "$date_range"; then
handle_error 1 "Data extraction failed" "extract_data"
return 1
fi
# Step 2: Validate extracted data
local extracted_files=("$DATA_DIR/temp"/_extracted_.csv)
for file in "${extracted_files[@]}"; do
if [[ -f "$file" ]]; then
validate_data_quality "$file" || {
handle_error 2 "Data quality check failed for $file" "validate_data"
return 1
}
fi
done
# Step 3: Transform data in parallel
transform_data_parallel "$DATA_DIR/temp" "$DATA_DIR/output"
# Step 4: Load to target system
load_data_to_target "$target_system" "$DATA_DIR/output"
log_info "Data pipeline completed successfully"
}
Data extraction function
extract_data_from_source() {
local source="$1"
local date_range="$2"
case "$source" in
"mysql")
extract_mysql_data "$date_range"
;;
"postgresql")
extract_postgresql_data "$date_range"
;;
"api")
extract_api_data "$date_range"
;;
*)
log_error "Unsupported source system: $source"
return 1
;;
esac
}
MySQL data extraction
extract_mysql_data() {
local date_range="$1"
local output_file="$DATA_DIR/temp/mysql_extracted_$(date +%Y%m%d).csv"
log_info "Extracting data from MySQL for date range: $date_range"
mysql -h "${CONFIG[mysql_host]}" \
-u "${CONFIG[mysql_user]}" \
-p"${CONFIG[mysql_password]}" \
-D "${CONFIG[mysql_database]}" \
--batch --raw \
-e "SELECT * FROM transactions WHERE date_created BETWEEN '$date_range'" \
> "$output_file"
}
```
Deployment Workflow
```bash
#!/bin/bash
Application deployment workflow
deployment_workflow() {
local application="$1"
local version="$2"
local environment="$3"
log_info "Starting deployment: $application v$version to $environment"
# Pre-deployment checks
if ! pre_deployment_checks "$application" "$environment"; then
log_error "Pre-deployment checks failed"
return 1
fi
# Backup current version
backup_current_version "$application" "$environment"
add_rollback_command "restore_backup '$application' '$environment'" \
"Restore $application backup in $environment"
# Deploy new version
if ! deploy_application "$application" "$version" "$environment"; then
log_error "Deployment failed, initiating rollback"
execute_rollback "Deployment failure"
return 1
fi
# Post-deployment verification
if ! verify_deployment "$application" "$environment"; then
log_error "Deployment verification failed, initiating rollback"
execute_rollback "Verification failure"
return 1
fi
log_info "Deployment completed successfully"
}
Pre-deployment checks
pre_deployment_checks() {
local application="$1"
local environment="$2"
log_info "Running pre-deployment checks"
# Check disk space
local available_space=$(df "$DEPLOY_DIR" | awk 'NR==2 {print $4}')
local required_space=1048576 # 1GB in KB
if (( available_space < required_space )); then
log_error "Insufficient disk space. Required: ${required_space}KB, Available: ${available_space}KB"
return 1
fi
# Check if application is running
if ! systemctl is-active --quiet "$application"; then
log_error "Application $application is not running"
return 1
fi
# Check database connectivity
if ! test_database_connection; then
log_error "Database connectivity check failed"
return 1
fi
log_info "Pre-deployment checks passed"
return 0
}
```
Integration with External Systems
API Integration
```bash
#!/bin/bash
API integration functions
Generic API client with authentication
api_call() {
local method="$1"
local endpoint="$2"
local data="$3"
local auth_header="Authorization: Bearer ${API_TOKEN}"
local content_type="Content-Type: application/json"
local curl_opts=(
--silent
--show-error
--fail
--max-time 30
--retry 3
--retry-delay 1
--header "$auth_header"
--header "$content_type"
)
case "$method" in
"GET")
curl "${curl_opts[@]}" "$endpoint"
;;
"POST")
curl "${curl_opts[@]}" --data "$data" "$endpoint"
;;
"PUT")
curl "${curl_opts[@]}" --request PUT --data "$data" "$endpoint"
;;
"DELETE")
curl "${curl_opts[@]}" --request DELETE "$endpoint"
;;
esac
}
Webhook integration
send_webhook_notification() {
local webhook_url="$1"
local event_type="$2"
local message="$3"
local payload=$(cat </dev/null 2>&1; then
log_info "Database health check passed"
return 0
else
log_error "Database health check failed"
return 1
fi
}
```
Monitoring and Logging
Comprehensive Logging System
```bash
#!/bin/bash
Advanced logging and monitoring
Structured logging with JSON output
log_structured() {
local level="$1"
local component="$2"
local event="$3"
local message="$4"
shift 4
local metadata="$*"
local log_entry=$(cat <> "$LOG_DIR/structured.log"
# Also log to standard log for readability
log "$level" "[$component:$event] $message"
}
Performance monitoring
start_performance_monitor() {
local operation_name="$1"
local monitor_file="/tmp/monitor_${operation_name}_$$.json"
local start_data=$(cat < "$monitor_file"
echo "$monitor_file"
}
stop_performance_monitor() {
local monitor_file="$1"
if [[ -f "$monitor_file" ]]; then
local start_data=$(cat "$monitor_file")
local operation=$(echo "$start_data" | jq -r '.operation')
local start_time=$(echo "$start_data" | jq -r '.start_time')
local start_memory=$(echo "$start_data" | jq -r '.start_memory')
local end_time=$(date +%s)
local end_memory=$(ps -o rss= -p $$ | tr -d ' ')
local duration=$((end_time - start_time))
local memory_diff=$((end_memory - start_memory))
log_structured "INFO" "performance" "operation_completed" \
"Operation $operation completed" \
"\"duration\": $duration, \"memory_used\": $memory_diff"
rm -f "$monitor_file"
fi
}
```
Testing and Debugging
Unit Testing Framework
```bash
#!/bin/bash
Simple unit testing framework for shell scripts
Test framework variables
declare -a TEST_FUNCTIONS
declare -A TEST_RESULTS
TEST_COUNT=0
PASSED_COUNT=0
FAILED_COUNT=0
Test assertion functions
assert_equals() {
local expected="$1"
local actual="$2"
local test_name="$3"
((TEST_COUNT++))
if [[ "$expected" == "$actual" ]]; then
TEST_RESULTS["$test_name"]="PASS"
((PASSED_COUNT++))
log_info "✓ $test_name"
else
TEST_RESULTS["$test_name"]="FAIL"
((FAILED_COUNT++))
log_error "✗ $test_name - Expected: '$expected', Got: '$actual'"
fi
}
assert_true() {
local condition="$1"
local test_name="$2"
if eval "$condition"; then
assert_equals "true" "true" "$test_name"
else
assert_equals "true" "false" "$test_name"
fi
}
Run all tests
run_tests() {
log_info "Running unit tests..."
# Discover and run test functions
for func in $(declare -F | grep "^declare -f test_" | sed 's/declare -f //'); do
log_info "Running $func"
$func
done
# Print summary
log_info "Test Results: $PASSED_COUNT passed, $FAILED_COUNT failed, $TEST_COUNT total"
if (( FAILED_COUNT > 0 )); then
return 1
fi
}
Example test functions
test_config_loading() {
# Test configuration loading
CONFIG[test_key]="test_value"
assert_equals "test_value" "${CONFIG[test_key]}" "Config loading test"
}
test_file_operations() {
# Test file operations
local test_file="/tmp/test_file_$$"
echo "test content" > "$test_file"
assert_true "[[ -f '$test_file' ]]" "File creation test"
local content=$(cat "$test_file")
assert_equals "test content" "$content" "File content test"
rm -f "$test_file"
}
```
Debugging Utilities
```bash
#!/bin/bash
Debugging utilities
Debug mode activation
enable_debug_mode() {
set -x # Enable trace mode
LOG_LEVEL="DEBUG"
CONFIG[debug_mode]="true"
log_info "Debug mode enabled"
}
Function execution tracer
trace_function() {
local function_name="$1"
shift
log_debug ">>> Entering function: $function_name with args: $*"
# Execute function with timing
local start_time=$(date +%s.%N)
"$function_name" "$@"
local exit_code=$?
local end_time=$(date +%s.%N)
local duration=$(echo "$end_time - $start_time" | bc)
log_debug "<<< Exiting function: $function_name (exit: $exit_code, duration: ${duration}s)"
return $exit_code
}
Interactive debugger
debug_breakpoint() {
local message="${1:-Breakpoint reached}"
if [[ "${CONFIG[debug_mode]}" == "true" ]]; then
log_warn "BREAKPOINT: $message"
echo "Variables:"
set | grep -E '^[A-Z_]+=' | head -20
echo "Press Enter to continue, 'q' to quit, or 'c' to disable breakpoints..."
read -r response
case "$response" in
'q'|'quit')
log_info "Exiting due to user request"
exit 0
;;
'c'|'continue')
CONFIG[debug_mode]="false"
log_info "Breakpoints disabled"
;;
esac
fi
}
```
Best Practices
Security Considerations
```bash
#!/bin/bash
Security best practices
Secure credential handling
load_credentials() {
local credential_file="$1"
# Check file permissions
if [[ $(stat -c %a "$credential_file") != "600" ]]; then
log_error "Credential file has unsafe permissions"
return 1
fi
# Load credentials into memory only
while IFS='=' read -r key value; do
[[ $key =~ ^[[:space:]]*# ]] && continue
[[ -z "$key" ]] && continue
# Export as environment variable with prefix
export "CRED_${key}=${value}"
done < "$credential_file"
}
Input sanitization
sanitize_input() {
local input="$1"
local pattern="$2"
# Remove potentially dangerous characters
input=$(echo "$input" | tr -d ';|&$`<>')
# Validate against pattern if provided
if [[ -n "$pattern" ]] && [[ ! "$input" =~ $pattern ]]; then
log_error "Input validation failed: $input"
return 1
fi
echo "$input"
}
Secure temporary file creation
create_secure_temp_file() {
local prefix="${1:-workflow}"
local temp_file=$(mktemp "/tmp/${prefix}.XXXXXX")
# Set restrictive permissions
chmod 600 "$temp_file"
# Add to cleanup list
TEMP_FILES+=("$temp_file")
echo "$temp_file"
}
```
Performance Optimization
```bash
#!/bin/bash
Performance optimization techniques
Efficient file processing
process_large_file() {
local input_file="$1"
local chunk_size="${2:-1000}"
log_info "Processing large file: $input_file"
# Process file in chunks to manage memory
split -l "$chunk_size" "$input_file" "/tmp/chunk_"
for chunk in /tmp/chunk_*; do
process_chunk "$chunk" &
done
wait # Wait for all background processes
# Cleanup chunks
rm -f /tmp/chunk_*
}
Memory-efficient data processing
efficient_data_transform() {
local input_file="$1"
local output_file="$2"
# Use pipes to avoid loading entire file into memory
awk 'BEGIN {OFS=","} {gsub(/old_value/, "new_value"); print}' "$input_file" > "$output_file"
}
Optimize system resource usage
optimize_system_resources() {
# Set optimal nice value
renice -n 10 $$
# Limit memory usage if available
if command -v ulimit >/dev/null; then
ulimit -m 1048576 # 1GB memory limit
fi
# Set optimal I/O scheduling
if [[ -f /sys/block/sda/queue/scheduler ]]; then
echo "deadline" > /sys/block/sda/queue/scheduler 2>/dev/null || true
fi
}
```
Common Issues and Troubleshooting
Debugging Common Problems
```bash
#!/bin/bash
Common troubleshooting utilities
System resource checker
check_system_resources() {
log_info "Checking system resources..."
# Check memory usage
local memory_usage=$(free | grep Mem | awk '{print int($3/$2 * 100.0)}')
if (( memory_usage > 90 )); then
log_warn "High memory usage: ${memory_usage}%"
fi
# Check disk space
local disk_usage=$(df / | tail -1 | awk '{print int($5)}')
if (( disk_usage > 85 )); then
log_warn "High disk usage: ${disk_usage}%"
fi
# Check load average
local load_avg=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $1}' | sed 's/,//')
local cpu_cores=$(nproc)
local load_threshold=$(echo "$cpu_cores * 2" | bc)
if (( $(echo "$load_avg > $load_threshold" | bc -l) )); then
log_warn "High system load: $load_avg (threshold: $load_threshold)"
fi
}
Network connectivity troubleshooter
troubleshoot_connectivity() {
local host="$1"
local port="$2"
log_info "Troubleshooting connectivity to $host:$port"
# Check DNS resolution
if ! nslookup "$host" >/dev/null 2>&1; then
log_error "DNS resolution failed for $host"
return 1
fi
# Check port connectivity
if command -v nc >/dev/null; then
if ! nc -z "$host" "$port" 2>/dev/null; then
log_error "Cannot connect to $host:$port"
return 1
fi
fi
log_info "Connectivity check passed"
}
Log analyzer
analyze_logs() {
local log_file="$1"
if [[ ! -f "$log_file" ]]; then
log_error "Log file not found: $log_file"
return 1
fi
local error_count=$(grep -c "ERROR" "$log_file")
local warn_count=$(grep -c "WARN" "$log_file")
log_info "Log analysis for $log_file:"
log_info " Errors: $error_count"
log_info " Warnings: $warn_count"
if (( error_count > 0 )); then
log_info "Recent errors:"
grep "ERROR" "$log_file" | tail -5
fi
}
Process troubleshooter
troubleshoot_processes() {
local process_name="$1"
log_info "Troubleshooting process: $process_name"
local pids=$(pgrep "$process_name")
if [[ -z "$pids" ]]; then
log_warn "No processes found matching: $process_name"
return 1
fi
for pid in $pids; do
local cpu_usage=$(ps -p "$pid" -o %cpu= | tr -d ' ')
local memory_usage=$(ps -p "$pid" -o %mem= | tr -d ' ')
local status=$(ps -p "$pid" -o state= | tr -d ' ')
log_info "PID $pid: CPU=$cpu_usage%, Memory=$memory_usage%, Status=$status"
done
}
```
Error Recovery Patterns
```bash
#!/bin/bash
Common error recovery patterns
Graceful degradation
implement_graceful_degradation() {
local primary_service="$1"
local fallback_service="$2"
if ! check_service_health "$primary_service"; then
log_warn "Primary service unavailable, switching to fallback"
if check_service_health "$fallback_service"; then
log_info "Using fallback service: $fallback_service"
export ACTIVE_SERVICE="$fallback_service"
return 0
else
log_error "Both primary and fallback services unavailable"
return 1
fi
fi
export ACTIVE_SERVICE="$primary_service"
}
Auto-recovery mechanism
setup_auto_recovery() {
local service_name="$1"
local recovery_command="$2"
while true; do
if ! check_service_health "$service_name"; then
log_warn "Service $service_name is down, attempting recovery"
if eval "$recovery_command"; then
log_info "Service $service_name recovered successfully"
else
log_error "Failed to recover service $service_name"
sleep 60 # Wait before next attempt
fi
fi
sleep 30 # Check every 30 seconds
done &
local recovery_pid=$!
RECOVERY_PROCESSES+=("$recovery_pid")
log_info "Auto-recovery enabled for $service_name (PID: $recovery_pid)"
}
```
Conclusion
Mastering complex workflow automation with shell scripts requires understanding multiple interconnected concepts: modular architecture, advanced error handling, parallel processing, external system integration, and comprehensive monitoring. The techniques and patterns presented in this guide provide a solid foundation for building robust, maintainable, and scalable automation solutions.
Key Takeaways
1. Modular Design: Break complex workflows into manageable, reusable components
2. Error Handling: Implement comprehensive error detection, recovery, and rollback mechanisms
3. Monitoring: Use structured logging and performance monitoring to maintain visibility
4. Testing: Develop automated tests to ensure reliability and catch regressions
5. Security: Follow security best practices for credential management and input validation
Next Steps
As you implement these patterns in your own workflows:
- Start with simple workflows and gradually add complexity
- Test thoroughly in non-production environments
- Document your workflows and maintain configuration management
- Monitor performance and optimize bottlenecks
- Build a library of reusable components for common tasks
The investment in well-architected shell script automation pays dividends in reliability, maintainability, and operational efficiency. By following these advanced techniques, you'll be able to create sophisticated automation solutions that can handle the complexities of modern IT environments while remaining understandable and maintainable by your team.
Remember that workflow automation is an iterative process. Continuously refine your scripts based on operational experience, changing requirements, and lessons learned from production usage. The patterns and techniques in this guide will serve as your foundation for building increasingly sophisticated automation solutions.