How to automate complex workflows with shell scripts

How to Automate Complex Workflows with Shell Scripts Table of Contents 1. [Introduction](#introduction) 2. [Prerequisites](#prerequisites) 3. [Understanding Complex Workflow Automation](#understanding-complex-workflow-automation) 4. [Essential Shell Scripting Concepts](#essential-shell-scripting-concepts) 5. [Building Modular Script Architecture](#building-modular-script-architecture) 6. [Advanced Flow Control and Logic](#advanced-flow-control-and-logic) 7. [Error Handling and Recovery](#error-handling-and-recovery) 8. [Parallel Processing and Job Management](#parallel-processing-and-job-management) 9. [Real-World Workflow Examples](#real-world-workflow-examples) 10. [Integration with External Systems](#integration-with-external-systems) 11. [Monitoring and Logging](#monitoring-and-logging) 12. [Testing and Debugging](#testing-and-debugging) 13. [Best Practices](#best-practices) 14. [Common Issues and Troubleshooting](#common-issues-and-troubleshooting) 15. [Conclusion](#conclusion) Introduction Shell script automation has evolved from simple command sequences to sophisticated workflow orchestration systems capable of managing complex enterprise processes. This comprehensive guide explores advanced techniques for creating robust, maintainable, and scalable shell scripts that can handle intricate workflows involving multiple systems, conditional logic, error recovery, and parallel processing. Modern workflow automation requires scripts that can adapt to changing conditions, handle failures gracefully, and provide comprehensive monitoring and reporting. Whether you're managing data pipelines, deployment processes, system maintenance tasks, or integration workflows, mastering advanced shell scripting techniques will significantly enhance your automation capabilities. Prerequisites Before diving into complex workflow automation, ensure you have: Technical Requirements - Operating System: Linux, macOS, or Unix-like environment - Shell: Bash 4.0+ (recommended) or compatible shell - Text Editor: vi/vim, nano, or your preferred editor - Basic Tools: grep, sed, awk, curl, jq (for JSON processing) Knowledge Requirements - Basic Shell Scripting: Variables, loops, conditionals - Command Line Proficiency: File operations, process management - System Administration: Understanding of processes, permissions, networking - Regular Expressions: Pattern matching and text processing Environment Setup ```bash Verify bash version bash --version Install essential tools (Ubuntu/Debian) sudo apt-get update sudo apt-get install jq curl wget git Install essential tools (CentOS/RHEL) sudo yum install jq curl wget git Create workspace directory mkdir -p ~/workflow-automation cd ~/workflow-automation ``` Understanding Complex Workflow Automation Complex workflows typically involve multiple interconnected tasks that must execute in specific sequences, handle various conditions, and recover from failures. Understanding the components of such workflows is crucial for effective automation. Workflow Characteristics Sequential Dependencies: Tasks that must complete before others can begin Parallel Processing: Independent tasks that can run simultaneously Conditional Logic: Decision points based on data or system states Error Recovery: Mechanisms to handle and recover from failures Resource Management: Efficient use of system resources and external services Planning Your Workflow Before writing code, create a workflow diagram that identifies: 1. Entry Points: How the workflow initiates 2. Task Dependencies: Which tasks depend on others 3. Decision Points: Where conditional logic is needed 4. Failure Scenarios: What can go wrong and how to handle it 5. Exit Conditions: How the workflow completes successfully Essential Shell Scripting Concepts Advanced Variable Handling ```bash #!/bin/bash Configuration management with associative arrays declare -A CONFIG CONFIG[database_host]="localhost" CONFIG[database_port]="5432" CONFIG[max_retries]="3" CONFIG[timeout]="30" Function to load configuration from file load_config() { local config_file="$1" if [[ -f "$config_file" ]]; then while IFS='=' read -r key value; do # Skip comments and empty lines [[ $key =~ ^[[:space:]]*# ]] && continue [[ -z "$key" ]] && continue # Remove quotes and whitespace key=$(echo "$key" | sed 's/^[[:space:]]//;s/[[:space:]]$//') value=$(echo "$value" | sed 's/^[[:space:]]//;s/[[:space:]]$//;s/^"//;s/"$//') CONFIG["$key"]="$value" done < "$config_file" fi } Environment-aware configuration set_environment() { local env="${1:-development}" case "$env" in "production") CONFIG[log_level]="ERROR" CONFIG[debug_mode]="false" ;; "staging") CONFIG[log_level]="WARN" CONFIG[debug_mode]="false" ;; "development") CONFIG[log_level]="DEBUG" CONFIG[debug_mode]="true" ;; esac } ``` Function Libraries and Modularity ```bash #!/bin/bash logging.sh - Logging utility functions SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" LOG_FILE="${SCRIPT_DIR}/workflow.log" LOG_LEVEL="${LOG_LEVEL:-INFO}" Logging function with levels log() { local level="$1" shift local message="$*" local timestamp=$(date '+%Y-%m-%d %H:%M:%S') # Define log level hierarchy declare -A levels=([DEBUG]=0 [INFO]=1 [WARN]=2 [ERROR]=3) local current_level=${levels[$LOG_LEVEL]:-1} local msg_level=${levels[$level]:-1} # Only log if message level is >= current level if (( msg_level >= current_level )); then echo "[$timestamp] [$level] $message" | tee -a "$LOG_FILE" fi } Specialized logging functions log_debug() { log "DEBUG" "$@"; } log_info() { log "INFO" "$@"; } log_warn() { log "WARN" "$@"; } log_error() { log "ERROR" "$@"; } Progress tracking show_progress() { local current="$1" local total="$2" local description="$3" local percent=$((current * 100 / total)) local bar_length=50 local filled_length=$((percent * bar_length / 100)) printf "\r[" printf "%${filled_length}s" | tr ' ' '=' printf "%$((bar_length - filled_length))s" | tr ' ' '-' printf "] %d%% %s" "$percent" "$description" if (( current == total )); then echo "" fi } ``` Building Modular Script Architecture Directory Structure ``` workflow-automation/ ├── bin/ # Main executable scripts │ ├── main-workflow.sh │ └── sub-workflow.sh ├── lib/ # Function libraries │ ├── logging.sh │ ├── database.sh │ ├── api-client.sh │ └── file-operations.sh ├── config/ # Configuration files │ ├── default.conf │ ├── production.conf │ └── development.conf ├── templates/ # Template files │ ├── email-template.html │ └── report-template.sql ├── data/ # Data files and temporary storage │ ├── input/ │ ├── output/ │ └── temp/ └── logs/ # Log files ├── workflow.log └── error.log ``` Main Workflow Controller ```bash #!/bin/bash bin/main-workflow.sh - Main workflow controller set -euo pipefail # Exit on error, undefined vars, pipe failures Get script directory and set up paths SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" PROJECT_ROOT="$(dirname "$SCRIPT_DIR")" LIB_DIR="$PROJECT_ROOT/lib" CONFIG_DIR="$PROJECT_ROOT/config" DATA_DIR="$PROJECT_ROOT/data" LOG_DIR="$PROJECT_ROOT/logs" Create directories if they don't exist mkdir -p "$DATA_DIR"/{input,output,temp} "$LOG_DIR" Source library functions source "$LIB_DIR/logging.sh" source "$LIB_DIR/database.sh" source "$LIB_DIR/api-client.sh" source "$LIB_DIR/file-operations.sh" Global variables WORKFLOW_ID="" START_TIME="" ENVIRONMENT="${ENVIRONMENT:-development}" DRY_RUN="${DRY_RUN:-false}" Initialize workflow initialize_workflow() { START_TIME=$(date '+%Y-%m-%d_%H-%M-%S') WORKFLOW_ID="workflow_${START_TIME}_$$" log_info "Starting workflow: $WORKFLOW_ID" log_info "Environment: $ENVIRONMENT" log_info "Dry run mode: $DRY_RUN" # Load configuration load_config "$CONFIG_DIR/default.conf" load_config "$CONFIG_DIR/$ENVIRONMENT.conf" # Set up signal handlers trap cleanup_workflow EXIT trap 'log_error "Workflow interrupted"; exit 130' INT TERM } Cleanup function cleanup_workflow() { local exit_code=$? local end_time=$(date '+%Y-%m-%d %H:%M:%S') if (( exit_code == 0 )); then log_info "Workflow completed successfully: $WORKFLOW_ID" else log_error "Workflow failed with exit code: $exit_code" fi log_info "End time: $end_time" # Cleanup temporary files if [[ -d "$DATA_DIR/temp" ]]; then find "$DATA_DIR/temp" -name "${WORKFLOW_ID}" -type f -delete 2>/dev/null || true fi } ``` Advanced Flow Control and Logic State Machine Implementation ```bash #!/bin/bash State machine for complex workflow control declare -A WORKFLOW_STATES declare -A STATE_TRANSITIONS CURRENT_STATE="INIT" Define workflow states WORKFLOW_STATES=( [INIT]="initialize_system" [VALIDATE]="validate_inputs" [PROCESS]="process_data" [TRANSFORM]="transform_results" [DEPLOY]="deploy_changes" [VERIFY]="verify_deployment" [COMPLETE]="finalize_workflow" [ERROR]="handle_error" [ROLLBACK]="rollback_changes" ) Define valid state transitions STATE_TRANSITIONS=( [INIT]="VALIDATE ERROR" [VALIDATE]="PROCESS ERROR" [PROCESS]="TRANSFORM ERROR" [TRANSFORM]="DEPLOY ERROR" [DEPLOY]="VERIFY ERROR ROLLBACK" [VERIFY]="COMPLETE ERROR ROLLBACK" [COMPLETE]="" [ERROR]="ROLLBACK" [ROLLBACK]="ERROR" ) Execute state machine execute_state_machine() { local max_iterations=20 local iteration=0 while [[ "$CURRENT_STATE" != "COMPLETE" && "$CURRENT_STATE" != "ERROR" ]]; do ((iteration++)) if (( iteration > max_iterations )); then log_error "State machine exceeded maximum iterations" CURRENT_STATE="ERROR" break fi log_info "Executing state: $CURRENT_STATE (iteration $iteration)" # Get function for current state local state_function="${WORKFLOW_STATES[$CURRENT_STATE]}" if [[ -z "$state_function" ]]; then log_error "No function defined for state: $CURRENT_STATE" CURRENT_STATE="ERROR" continue fi # Execute state function if $state_function; then log_info "State $CURRENT_STATE completed successfully" else log_error "State $CURRENT_STATE failed" CURRENT_STATE="ERROR" fi done log_info "State machine finished in state: $CURRENT_STATE" } ``` Error Handling and Recovery Comprehensive Error Management ```bash #!/bin/bash Advanced error handling and recovery mechanisms Error severity levels declare -A ERROR_LEVELS=( [CRITICAL]=3 [HIGH]=2 [MEDIUM]=1 [LOW]=0 ) Enhanced error handling function handle_error() { local error_code="$1" local error_message="$2" local operation="$3" local severity="${4:-MEDIUM}" local recovery_strategy="${5:-RETRY}" log_error "Error in operation '$operation': $error_message (Code: $error_code)" # Record error details local error_record=$(cat <> "$LOG_DIR/error-details.json" # Execute recovery strategy case "$recovery_strategy" in "RETRY") retry_with_backoff "$operation" 3 ;; "ROLLBACK") execute_rollback_procedure "$operation" ;; "SKIP") log_warn "Skipping operation '$operation' due to error" ;; "MANUAL") request_manual_intervention "$operation" "$error_message" ;; esac } Retry mechanism with exponential backoff retry_with_backoff() { local operation="$1" local max_attempts="${2:-3}" local base_delay="${3:-1}" local max_delay="${4:-60}" local attempt=1 local delay="$base_delay" while (( attempt <= max_attempts )); do log_info "Attempting operation '$operation' (attempt $attempt/$max_attempts)" if eval "$operation"; then log_info "Operation '$operation' succeeded on attempt $attempt" return 0 fi if (( attempt < max_attempts )); then log_warn "Operation '$operation' failed, retrying in ${delay}s" sleep "$delay" # Exponential backoff with jitter delay=$(( delay * 2 + RANDOM % 5 )) if (( delay > max_delay )); then delay="$max_delay" fi fi ((attempt++)) done log_error "Operation '$operation' failed after $max_attempts attempts" return 1 } ``` Parallel Processing and Job Management Advanced Parallel Execution ```bash #!/bin/bash Parallel processing and job management Job management system declare -A ACTIVE_JOBS declare -A JOB_STATUS declare -A JOB_RESULTS MAX_CONCURRENT_JOBS="${MAX_CONCURRENT_JOBS:-4}" Start background job with monitoring start_background_job() { local job_name="$1" local job_command="$2" local job_timeout="${3:-300}" # 5 minutes default # Check if we've reached the maximum concurrent jobs while (( ${#ACTIVE_JOBS[@]} >= MAX_CONCURRENT_JOBS )); do log_debug "Maximum concurrent jobs reached, waiting..." sleep 2 cleanup_completed_jobs done log_info "Starting background job: $job_name" # Start job in background with timeout timeout "$job_timeout" bash -c "$job_command" & local job_pid=$! # Register job ACTIVE_JOBS["$job_name"]="$job_pid" JOB_STATUS["$job_name"]="RUNNING" log_info "Job started: $job_name (PID: $job_pid)" return 0 } Parallel data processing example process_files_parallel() { local input_dir="$1" local output_dir="$2" local processor_function="$3" mkdir -p "$output_dir" local file_count=0 local files=() # Collect files to process while IFS= read -r -d '' file; do files+=("$file") ((file_count++)) done < <(find "$input_dir" -name "*.csv" -print0) if (( file_count == 0 )); then log_warn "No files found in $input_dir" return 0 fi log_info "Processing $file_count files in parallel" # Process files in batches local processed=0 for file in "${files[@]}"; do local job_name="process_$(basename "$file")" local output_file="$output_dir/$(basename "$file" .csv)_processed.csv" start_background_job "$job_name" "$processor_function '$file' '$output_file'" ((processed++)) show_progress "$processed" "$file_count" "Starting jobs" done # Wait for all jobs to complete wait_for_all_jobs log_info "All file processing jobs completed" } ``` Real-World Workflow Examples Data Pipeline Workflow ```bash #!/bin/bash Complete data pipeline workflow example data_pipeline_workflow() { local source_system="$1" local target_system="$2" local date_range="$3" log_info "Starting data pipeline: $source_system -> $target_system" # Step 1: Extract data if ! extract_data_from_source "$source_system" "$date_range"; then handle_error 1 "Data extraction failed" "extract_data" return 1 fi # Step 2: Validate extracted data local extracted_files=("$DATA_DIR/temp"/_extracted_.csv) for file in "${extracted_files[@]}"; do if [[ -f "$file" ]]; then validate_data_quality "$file" || { handle_error 2 "Data quality check failed for $file" "validate_data" return 1 } fi done # Step 3: Transform data in parallel transform_data_parallel "$DATA_DIR/temp" "$DATA_DIR/output" # Step 4: Load to target system load_data_to_target "$target_system" "$DATA_DIR/output" log_info "Data pipeline completed successfully" } Data extraction function extract_data_from_source() { local source="$1" local date_range="$2" case "$source" in "mysql") extract_mysql_data "$date_range" ;; "postgresql") extract_postgresql_data "$date_range" ;; "api") extract_api_data "$date_range" ;; *) log_error "Unsupported source system: $source" return 1 ;; esac } MySQL data extraction extract_mysql_data() { local date_range="$1" local output_file="$DATA_DIR/temp/mysql_extracted_$(date +%Y%m%d).csv" log_info "Extracting data from MySQL for date range: $date_range" mysql -h "${CONFIG[mysql_host]}" \ -u "${CONFIG[mysql_user]}" \ -p"${CONFIG[mysql_password]}" \ -D "${CONFIG[mysql_database]}" \ --batch --raw \ -e "SELECT * FROM transactions WHERE date_created BETWEEN '$date_range'" \ > "$output_file" } ``` Deployment Workflow ```bash #!/bin/bash Application deployment workflow deployment_workflow() { local application="$1" local version="$2" local environment="$3" log_info "Starting deployment: $application v$version to $environment" # Pre-deployment checks if ! pre_deployment_checks "$application" "$environment"; then log_error "Pre-deployment checks failed" return 1 fi # Backup current version backup_current_version "$application" "$environment" add_rollback_command "restore_backup '$application' '$environment'" \ "Restore $application backup in $environment" # Deploy new version if ! deploy_application "$application" "$version" "$environment"; then log_error "Deployment failed, initiating rollback" execute_rollback "Deployment failure" return 1 fi # Post-deployment verification if ! verify_deployment "$application" "$environment"; then log_error "Deployment verification failed, initiating rollback" execute_rollback "Verification failure" return 1 fi log_info "Deployment completed successfully" } Pre-deployment checks pre_deployment_checks() { local application="$1" local environment="$2" log_info "Running pre-deployment checks" # Check disk space local available_space=$(df "$DEPLOY_DIR" | awk 'NR==2 {print $4}') local required_space=1048576 # 1GB in KB if (( available_space < required_space )); then log_error "Insufficient disk space. Required: ${required_space}KB, Available: ${available_space}KB" return 1 fi # Check if application is running if ! systemctl is-active --quiet "$application"; then log_error "Application $application is not running" return 1 fi # Check database connectivity if ! test_database_connection; then log_error "Database connectivity check failed" return 1 fi log_info "Pre-deployment checks passed" return 0 } ``` Integration with External Systems API Integration ```bash #!/bin/bash API integration functions Generic API client with authentication api_call() { local method="$1" local endpoint="$2" local data="$3" local auth_header="Authorization: Bearer ${API_TOKEN}" local content_type="Content-Type: application/json" local curl_opts=( --silent --show-error --fail --max-time 30 --retry 3 --retry-delay 1 --header "$auth_header" --header "$content_type" ) case "$method" in "GET") curl "${curl_opts[@]}" "$endpoint" ;; "POST") curl "${curl_opts[@]}" --data "$data" "$endpoint" ;; "PUT") curl "${curl_opts[@]}" --request PUT --data "$data" "$endpoint" ;; "DELETE") curl "${curl_opts[@]}" --request DELETE "$endpoint" ;; esac } Webhook integration send_webhook_notification() { local webhook_url="$1" local event_type="$2" local message="$3" local payload=$(cat </dev/null 2>&1; then log_info "Database health check passed" return 0 else log_error "Database health check failed" return 1 fi } ``` Monitoring and Logging Comprehensive Logging System ```bash #!/bin/bash Advanced logging and monitoring Structured logging with JSON output log_structured() { local level="$1" local component="$2" local event="$3" local message="$4" shift 4 local metadata="$*" local log_entry=$(cat <> "$LOG_DIR/structured.log" # Also log to standard log for readability log "$level" "[$component:$event] $message" } Performance monitoring start_performance_monitor() { local operation_name="$1" local monitor_file="/tmp/monitor_${operation_name}_$$.json" local start_data=$(cat < "$monitor_file" echo "$monitor_file" } stop_performance_monitor() { local monitor_file="$1" if [[ -f "$monitor_file" ]]; then local start_data=$(cat "$monitor_file") local operation=$(echo "$start_data" | jq -r '.operation') local start_time=$(echo "$start_data" | jq -r '.start_time') local start_memory=$(echo "$start_data" | jq -r '.start_memory') local end_time=$(date +%s) local end_memory=$(ps -o rss= -p $$ | tr -d ' ') local duration=$((end_time - start_time)) local memory_diff=$((end_memory - start_memory)) log_structured "INFO" "performance" "operation_completed" \ "Operation $operation completed" \ "\"duration\": $duration, \"memory_used\": $memory_diff" rm -f "$monitor_file" fi } ``` Testing and Debugging Unit Testing Framework ```bash #!/bin/bash Simple unit testing framework for shell scripts Test framework variables declare -a TEST_FUNCTIONS declare -A TEST_RESULTS TEST_COUNT=0 PASSED_COUNT=0 FAILED_COUNT=0 Test assertion functions assert_equals() { local expected="$1" local actual="$2" local test_name="$3" ((TEST_COUNT++)) if [[ "$expected" == "$actual" ]]; then TEST_RESULTS["$test_name"]="PASS" ((PASSED_COUNT++)) log_info "✓ $test_name" else TEST_RESULTS["$test_name"]="FAIL" ((FAILED_COUNT++)) log_error "✗ $test_name - Expected: '$expected', Got: '$actual'" fi } assert_true() { local condition="$1" local test_name="$2" if eval "$condition"; then assert_equals "true" "true" "$test_name" else assert_equals "true" "false" "$test_name" fi } Run all tests run_tests() { log_info "Running unit tests..." # Discover and run test functions for func in $(declare -F | grep "^declare -f test_" | sed 's/declare -f //'); do log_info "Running $func" $func done # Print summary log_info "Test Results: $PASSED_COUNT passed, $FAILED_COUNT failed, $TEST_COUNT total" if (( FAILED_COUNT > 0 )); then return 1 fi } Example test functions test_config_loading() { # Test configuration loading CONFIG[test_key]="test_value" assert_equals "test_value" "${CONFIG[test_key]}" "Config loading test" } test_file_operations() { # Test file operations local test_file="/tmp/test_file_$$" echo "test content" > "$test_file" assert_true "[[ -f '$test_file' ]]" "File creation test" local content=$(cat "$test_file") assert_equals "test content" "$content" "File content test" rm -f "$test_file" } ``` Debugging Utilities ```bash #!/bin/bash Debugging utilities Debug mode activation enable_debug_mode() { set -x # Enable trace mode LOG_LEVEL="DEBUG" CONFIG[debug_mode]="true" log_info "Debug mode enabled" } Function execution tracer trace_function() { local function_name="$1" shift log_debug ">>> Entering function: $function_name with args: $*" # Execute function with timing local start_time=$(date +%s.%N) "$function_name" "$@" local exit_code=$? local end_time=$(date +%s.%N) local duration=$(echo "$end_time - $start_time" | bc) log_debug "<<< Exiting function: $function_name (exit: $exit_code, duration: ${duration}s)" return $exit_code } Interactive debugger debug_breakpoint() { local message="${1:-Breakpoint reached}" if [[ "${CONFIG[debug_mode]}" == "true" ]]; then log_warn "BREAKPOINT: $message" echo "Variables:" set | grep -E '^[A-Z_]+=' | head -20 echo "Press Enter to continue, 'q' to quit, or 'c' to disable breakpoints..." read -r response case "$response" in 'q'|'quit') log_info "Exiting due to user request" exit 0 ;; 'c'|'continue') CONFIG[debug_mode]="false" log_info "Breakpoints disabled" ;; esac fi } ``` Best Practices Security Considerations ```bash #!/bin/bash Security best practices Secure credential handling load_credentials() { local credential_file="$1" # Check file permissions if [[ $(stat -c %a "$credential_file") != "600" ]]; then log_error "Credential file has unsafe permissions" return 1 fi # Load credentials into memory only while IFS='=' read -r key value; do [[ $key =~ ^[[:space:]]*# ]] && continue [[ -z "$key" ]] && continue # Export as environment variable with prefix export "CRED_${key}=${value}" done < "$credential_file" } Input sanitization sanitize_input() { local input="$1" local pattern="$2" # Remove potentially dangerous characters input=$(echo "$input" | tr -d ';|&$`<>') # Validate against pattern if provided if [[ -n "$pattern" ]] && [[ ! "$input" =~ $pattern ]]; then log_error "Input validation failed: $input" return 1 fi echo "$input" } Secure temporary file creation create_secure_temp_file() { local prefix="${1:-workflow}" local temp_file=$(mktemp "/tmp/${prefix}.XXXXXX") # Set restrictive permissions chmod 600 "$temp_file" # Add to cleanup list TEMP_FILES+=("$temp_file") echo "$temp_file" } ``` Performance Optimization ```bash #!/bin/bash Performance optimization techniques Efficient file processing process_large_file() { local input_file="$1" local chunk_size="${2:-1000}" log_info "Processing large file: $input_file" # Process file in chunks to manage memory split -l "$chunk_size" "$input_file" "/tmp/chunk_" for chunk in /tmp/chunk_*; do process_chunk "$chunk" & done wait # Wait for all background processes # Cleanup chunks rm -f /tmp/chunk_* } Memory-efficient data processing efficient_data_transform() { local input_file="$1" local output_file="$2" # Use pipes to avoid loading entire file into memory awk 'BEGIN {OFS=","} {gsub(/old_value/, "new_value"); print}' "$input_file" > "$output_file" } Optimize system resource usage optimize_system_resources() { # Set optimal nice value renice -n 10 $$ # Limit memory usage if available if command -v ulimit >/dev/null; then ulimit -m 1048576 # 1GB memory limit fi # Set optimal I/O scheduling if [[ -f /sys/block/sda/queue/scheduler ]]; then echo "deadline" > /sys/block/sda/queue/scheduler 2>/dev/null || true fi } ``` Common Issues and Troubleshooting Debugging Common Problems ```bash #!/bin/bash Common troubleshooting utilities System resource checker check_system_resources() { log_info "Checking system resources..." # Check memory usage local memory_usage=$(free | grep Mem | awk '{print int($3/$2 * 100.0)}') if (( memory_usage > 90 )); then log_warn "High memory usage: ${memory_usage}%" fi # Check disk space local disk_usage=$(df / | tail -1 | awk '{print int($5)}') if (( disk_usage > 85 )); then log_warn "High disk usage: ${disk_usage}%" fi # Check load average local load_avg=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $1}' | sed 's/,//') local cpu_cores=$(nproc) local load_threshold=$(echo "$cpu_cores * 2" | bc) if (( $(echo "$load_avg > $load_threshold" | bc -l) )); then log_warn "High system load: $load_avg (threshold: $load_threshold)" fi } Network connectivity troubleshooter troubleshoot_connectivity() { local host="$1" local port="$2" log_info "Troubleshooting connectivity to $host:$port" # Check DNS resolution if ! nslookup "$host" >/dev/null 2>&1; then log_error "DNS resolution failed for $host" return 1 fi # Check port connectivity if command -v nc >/dev/null; then if ! nc -z "$host" "$port" 2>/dev/null; then log_error "Cannot connect to $host:$port" return 1 fi fi log_info "Connectivity check passed" } Log analyzer analyze_logs() { local log_file="$1" if [[ ! -f "$log_file" ]]; then log_error "Log file not found: $log_file" return 1 fi local error_count=$(grep -c "ERROR" "$log_file") local warn_count=$(grep -c "WARN" "$log_file") log_info "Log analysis for $log_file:" log_info " Errors: $error_count" log_info " Warnings: $warn_count" if (( error_count > 0 )); then log_info "Recent errors:" grep "ERROR" "$log_file" | tail -5 fi } Process troubleshooter troubleshoot_processes() { local process_name="$1" log_info "Troubleshooting process: $process_name" local pids=$(pgrep "$process_name") if [[ -z "$pids" ]]; then log_warn "No processes found matching: $process_name" return 1 fi for pid in $pids; do local cpu_usage=$(ps -p "$pid" -o %cpu= | tr -d ' ') local memory_usage=$(ps -p "$pid" -o %mem= | tr -d ' ') local status=$(ps -p "$pid" -o state= | tr -d ' ') log_info "PID $pid: CPU=$cpu_usage%, Memory=$memory_usage%, Status=$status" done } ``` Error Recovery Patterns ```bash #!/bin/bash Common error recovery patterns Graceful degradation implement_graceful_degradation() { local primary_service="$1" local fallback_service="$2" if ! check_service_health "$primary_service"; then log_warn "Primary service unavailable, switching to fallback" if check_service_health "$fallback_service"; then log_info "Using fallback service: $fallback_service" export ACTIVE_SERVICE="$fallback_service" return 0 else log_error "Both primary and fallback services unavailable" return 1 fi fi export ACTIVE_SERVICE="$primary_service" } Auto-recovery mechanism setup_auto_recovery() { local service_name="$1" local recovery_command="$2" while true; do if ! check_service_health "$service_name"; then log_warn "Service $service_name is down, attempting recovery" if eval "$recovery_command"; then log_info "Service $service_name recovered successfully" else log_error "Failed to recover service $service_name" sleep 60 # Wait before next attempt fi fi sleep 30 # Check every 30 seconds done & local recovery_pid=$! RECOVERY_PROCESSES+=("$recovery_pid") log_info "Auto-recovery enabled for $service_name (PID: $recovery_pid)" } ``` Conclusion Mastering complex workflow automation with shell scripts requires understanding multiple interconnected concepts: modular architecture, advanced error handling, parallel processing, external system integration, and comprehensive monitoring. The techniques and patterns presented in this guide provide a solid foundation for building robust, maintainable, and scalable automation solutions. Key Takeaways 1. Modular Design: Break complex workflows into manageable, reusable components 2. Error Handling: Implement comprehensive error detection, recovery, and rollback mechanisms 3. Monitoring: Use structured logging and performance monitoring to maintain visibility 4. Testing: Develop automated tests to ensure reliability and catch regressions 5. Security: Follow security best practices for credential management and input validation Next Steps As you implement these patterns in your own workflows: - Start with simple workflows and gradually add complexity - Test thoroughly in non-production environments - Document your workflows and maintain configuration management - Monitor performance and optimize bottlenecks - Build a library of reusable components for common tasks The investment in well-architected shell script automation pays dividends in reliability, maintainability, and operational efficiency. By following these advanced techniques, you'll be able to create sophisticated automation solutions that can handle the complexities of modern IT environments while remaining understandable and maintainable by your team. Remember that workflow automation is an iterative process. Continuously refine your scripts based on operational experience, changing requirements, and lessons learned from production usage. The patterns and techniques in this guide will serve as your foundation for building increasingly sophisticated automation solutions.