How to fetch a URL → curl

How to fetch a URL → curl Table of Contents 1. [Introduction](#introduction) 2. [Prerequisites](#prerequisites) 3. [Basic curl Syntax](#basic-curl-syntax) 4. [Simple URL Fetching](#simple-url-fetching) 5. [Common curl Options](#common-curl-options) 6. [Advanced Usage Examples](#advanced-usage-examples) 7. [Working with Different HTTP Methods](#working-with-different-http-methods) 8. [Handling Authentication](#handling-authentication) 9. [Working with Headers and Cookies](#working-with-headers-and-cookies) 10. [File Downloads and Uploads](#file-downloads-and-uploads) 11. [Troubleshooting Common Issues](#troubleshooting-common-issues) 12. [Best Practices](#best-practices) 13. [Performance Optimization](#performance-optimization) 14. [Conclusion](#conclusion) Introduction curl (Client URL) is a powerful command-line tool and library for transferring data with URLs. It supports numerous protocols including HTTP, HTTPS, FTP, FTPS, and many others. Whether you're a developer testing APIs, a system administrator monitoring web services, or someone who needs to download files programmatically, curl is an indispensable tool in your arsenal. This comprehensive guide will teach you everything you need to know about fetching URLs using curl, from basic usage to advanced techniques. You'll learn how to make HTTP requests, handle different response types, work with authentication, and troubleshoot common issues that arise when working with web services. By the end of this article, you'll be proficient in using curl for various web-related tasks and understand how to leverage its powerful features for your specific needs. Prerequisites Before diving into curl usage, ensure you have the following: System Requirements - Operating System: Linux, macOS, Windows, or Unix-based system - curl Installation: Most Unix-like systems come with curl pre-installed - Terminal Access: Command-line interface access - Basic Command-Line Knowledge: Understanding of terminal/command prompt usage Checking curl Installation To verify curl is installed on your system, run: ```bash curl --version ``` This command displays curl version information and supported protocols. If curl isn't installed, you can install it using your system's package manager: Ubuntu/Debian: ```bash sudo apt-get update sudo apt-get install curl ``` CentOS/RHEL/Fedora: ```bash sudo yum install curl or for newer versions sudo dnf install curl ``` macOS (using Homebrew): ```bash brew install curl ``` Windows: - Download from the official curl website - Use Windows Subsystem for Linux (WSL) - Install via package managers like Chocolatey Basic curl Syntax The fundamental curl syntax follows this pattern: ```bash curl [options] [URL] ``` Essential Components - curl: The command itself - [options]: Various flags and parameters to modify behavior - [URL]: The target URL to fetch Simple Example ```bash curl https://www.example.com ``` This basic command fetches the content from the specified URL and displays it in your terminal. Simple URL Fetching Let's start with the most basic curl operations to fetch URL content. Fetching a Web Page To retrieve the HTML content of a web page: ```bash curl https://httpbin.org/html ``` This command downloads and displays the HTML content directly in your terminal. The output includes the complete HTML structure of the page. Fetching JSON Data When working with APIs that return JSON: ```bash curl https://httpbin.org/json ``` This retrieves JSON data from the endpoint and displays it in your terminal. The raw JSON output can be piped to other tools for processing. Following Redirects Many URLs redirect to other locations. Use the `-L` flag to follow redirects automatically: ```bash curl -L https://bit.ly/2XYZ123 ``` Without the `-L` flag, curl would stop at the redirect response and not fetch the final destination. Silent Mode To suppress progress information and only show the content: ```bash curl -s https://api.github.com/users/octocat ``` The `-s` (silent) flag eliminates the progress meter and error messages, showing only the response content. Common curl Options Understanding curl's extensive options is crucial for effective usage. Here are the most commonly used flags: Output Options Save to File (`-o` and `-O`) ```bash Save with custom filename curl -o myfile.html https://www.example.com Save with original filename curl -O https://www.example.com/file.pdf ``` Append to File ```bash curl https://api.example.com/data >> accumulated_data.json ``` Verbose Output Show Detailed Information (`-v`) ```bash curl -v https://httpbin.org/get ``` This displays detailed information about the request and response, including headers and SSL handshake details. Show Only Headers (`-I`) ```bash curl -I https://www.example.com ``` The `-I` flag performs a HEAD request, returning only the HTTP headers without the body content. Request Modification Custom User Agent (`-A`) ```bash curl -A "MyApp/1.0" https://httpbin.org/user-agent ``` Custom Headers (`-H`) ```bash curl -H "Accept: application/json" -H "Authorization: Bearer token123" https://api.example.com/data ``` Request Timeout (`--connect-timeout` and `--max-time`) ```bash curl --connect-timeout 10 --max-time 30 https://slow-api.example.com ``` Advanced Usage Examples Working with Query Parameters When dealing with URLs containing query parameters, proper encoding is essential: ```bash Simple query parameters curl "https://httpbin.org/get?param1=value1¶m2=value2" URL encoding special characters curl "https://httpbin.org/get?search=hello%20world&category=tech" Using curl's built-in URL encoding curl -G -d "search=hello world" -d "category=tech" https://httpbin.org/get ``` Handling Multiple URLs curl can process multiple URLs in a single command: ```bash Fetch multiple URLs sequentially curl https://httpbin.org/get https://httpbin.org/ip https://httpbin.org/user-agent Use URL globbing for patterns curl https://example.com/file[1-5].txt Download files with different extensions curl https://example.com/file.{jpg,png,gif} ``` Rate Limiting and Delays To avoid overwhelming servers, implement delays between requests: ```bash Add 2-second delay between requests curl --rate 0.5/s https://api.example.com/endpoint[1-10] Manual delay using sleep in scripts for i in {1..10}; do curl https://api.example.com/endpoint/$i sleep 2 done ``` Working with Different HTTP Methods curl supports all standard HTTP methods for comprehensive API interaction. GET Requests (Default) ```bash Explicit GET request curl -X GET https://httpbin.org/get GET with query parameters curl -X GET "https://httpbin.org/get?key=value" ``` POST Requests Sending Form Data ```bash URL-encoded form data curl -X POST -d "username=john&password=secret" https://httpbin.org/post From file curl -X POST -d @form_data.txt https://httpbin.org/post ``` Sending JSON Data ```bash curl -X POST \ -H "Content-Type: application/json" \ -d '{"name":"John","email":"john@example.com"}' \ https://httpbin.org/post ``` Sending JSON from File ```bash curl -X POST \ -H "Content-Type: application/json" \ -d @user_data.json \ https://api.example.com/users ``` PUT Requests ```bash Update resource with JSON curl -X PUT \ -H "Content-Type: application/json" \ -d '{"name":"Updated Name","status":"active"}' \ https://api.example.com/users/123 ``` DELETE Requests ```bash Simple DELETE request curl -X DELETE https://api.example.com/users/123 DELETE with authentication curl -X DELETE \ -H "Authorization: Bearer your_token_here" \ https://api.example.com/users/123 ``` PATCH Requests ```bash Partial update with PATCH curl -X PATCH \ -H "Content-Type: application/json" \ -d '{"status":"inactive"}' \ https://api.example.com/users/123 ``` Handling Authentication curl supports various authentication methods for accessing protected resources. Basic Authentication Username and Password ```bash Interactive password prompt curl -u username https://httpbin.org/basic-auth/username/password Inline credentials (less secure) curl -u username:password https://httpbin.org/basic-auth/username/password From netrc file curl -n https://protected.example.com/data ``` Creating .netrc File ```bash ~/.netrc file content machine api.example.com login your_username password your_password ``` Bearer Token Authentication ```bash API token in header curl -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..." \ https://api.example.com/protected API key as query parameter curl "https://api.example.com/data?api_key=your_api_key_here" ``` OAuth 2.0 Authentication ```bash Using access token curl -H "Authorization: Bearer ACCESS_TOKEN" \ https://api.github.com/user OAuth with custom headers curl -H "Authorization: OAuth oauth_consumer_key=key,oauth_token=token..." \ https://api.twitter.com/1.1/statuses/home_timeline.json ``` Client Certificate Authentication ```bash Using client certificate curl --cert client.pem --key client-key.pem https://secure.example.com With certificate password curl --cert client.p12:password https://secure.example.com ``` Working with Headers and Cookies Custom Headers Single Header ```bash curl -H "Accept: application/json" https://api.example.com/data ``` Multiple Headers ```bash curl -H "Accept: application/json" \ -H "User-Agent: MyApp/1.0" \ -H "X-API-Key: secret123" \ https://api.example.com/data ``` Removing Default Headers ```bash Remove User-Agent header curl -H "User-Agent:" https://httpbin.org/headers ``` Cookie Management Sending Cookies ```bash Single cookie curl -b "session_id=abc123" https://example.com/dashboard Multiple cookies curl -b "session_id=abc123; preference=dark_mode" https://example.com/settings From cookie file curl -b cookies.txt https://example.com/protected ``` Saving Cookies ```bash Save cookies to file curl -c cookies.txt https://example.com/login Load and save cookies curl -b cookies.txt -c cookies.txt https://example.com/dashboard ``` Cookie Jar Management ```bash Automatic cookie handling curl -b cookie-jar.txt -c cookie-jar.txt https://example.com/step1 curl -b cookie-jar.txt -c cookie-jar.txt https://example.com/step2 ``` Response Header Analysis Show Response Headers ```bash Include headers in output curl -i https://httpbin.org/get Only headers (HEAD request) curl -I https://httpbin.org/get Dump headers to file curl -D headers.txt https://example.com ``` File Downloads and Uploads Downloading Files Simple Download ```bash Download and save with original name curl -O https://example.com/file.zip Download with custom name curl -o my_file.zip https://example.com/file.zip ``` Resume Interrupted Downloads ```bash Resume partial download curl -C - -O https://example.com/large_file.zip ``` Download with Progress Bar ```bash Show progress bar instead of progress meter curl -# -O https://example.com/file.zip ``` Parallel Downloads ```bash Download multiple files simultaneously curl -O https://example.com/file1.zip -O https://example.com/file2.zip & curl -O https://example.com/file3.zip -O https://example.com/file4.zip & wait ``` Uploading Files Form File Upload ```bash Upload file as form data curl -F "file=@document.pdf" https://httpbin.org/post Upload with additional form fields curl -F "file=@image.jpg" -F "description=Profile photo" https://api.example.com/upload ``` Binary File Upload ```bash Upload raw binary data curl -X POST --data-binary @file.zip https://api.example.com/upload Upload with specific content type curl -X POST \ -H "Content-Type: application/octet-stream" \ --data-binary @binary_file.dat \ https://api.example.com/binary ``` FTP Upload ```bash Upload to FTP server curl -T local_file.txt ftp://username:password@ftp.example.com/remote_file.txt Upload multiple files curl -T "{file1.txt,file2.txt}" ftp://username:password@ftp.example.com/ ``` Troubleshooting Common Issues Connection Problems SSL/TLS Certificate Issues ```bash Skip certificate verification (not recommended for production) curl -k https://self-signed.example.com Specify CA certificate curl --cacert ca-bundle.crt https://secure.example.com Use system CA bundle curl --capath /etc/ssl/certs https://secure.example.com ``` Timeout Issues ```bash Set connection timeout curl --connect-timeout 30 https://slow-server.example.com Set maximum total time curl --max-time 60 https://api.example.com/slow-endpoint Retry on failure curl --retry 3 --retry-delay 5 https://unreliable-api.example.com ``` DNS Resolution Problems ```bash Use specific DNS server curl --dns-servers 8.8.8.8,8.8.4.4 https://example.com Resolve hostname to specific IP curl --resolve example.com:443:192.168.1.100 https://example.com Force IPv4 or IPv6 curl -4 https://example.com # IPv4 only curl -6 https://example.com # IPv6 only ``` HTTP Error Handling Handle HTTP Error Codes ```bash Fail silently on HTTP errors curl -f https://httpbin.org/status/404 Show error message for HTTP errors curl -f -s -S https://httpbin.org/status/500 Continue on HTTP errors but show status curl -w "HTTP Status: %{http_code}\n" https://httpbin.org/status/404 ``` Response Code Checking ```bash Get only HTTP status code curl -s -o /dev/null -w "%{http_code}" https://example.com Detailed response information curl -w "Status: %{http_code}\nTime: %{time_total}s\nSize: %{size_download} bytes\n" \ https://example.com ``` Debugging and Logging Verbose Output for Debugging ```bash Maximum verbosity curl -v https://httpbin.org/get Trace ASCII output curl --trace-ascii trace.log https://httpbin.org/get Binary trace output curl --trace trace.bin https://httpbin.org/get ``` Network Interface Issues ```bash Use specific network interface curl --interface eth0 https://example.com Use specific local IP address curl --local-port 8080-8090 https://example.com ``` Best Practices Security Considerations Protect Sensitive Data ```bash Use environment variables for sensitive data export API_TOKEN="your_secret_token" curl -H "Authorization: Bearer $API_TOKEN" https://api.example.com Use configuration files with proper permissions chmod 600 ~/.curlrc echo 'header = "Authorization: Bearer secret_token"' >> ~/.curlrc ``` Validate SSL Certificates ```bash Always verify SSL certificates in production curl --cert-status https://secure-api.example.com Pin certificate fingerprints for critical services curl --pinnedpubkey sha256//base64encodedkey https://critical-api.example.com ``` Performance Optimization Connection Reuse ```bash Enable HTTP/2 when available curl --http2 https://http2-enabled-site.example.com Keep-alive connections curl --keepalive-time 60 https://api.example.com/endpoint1 ``` Compression ```bash Enable compression curl --compressed https://api.example.com/large-response Specify accepted encodings curl -H "Accept-Encoding: gzip, deflate, br" https://example.com ``` Bandwidth Management ```bash Limit download speed curl --limit-rate 100k https://example.com/large-file.zip Limit upload speed curl --limit-rate 50k -T large-upload.zip ftp://ftp.example.com/ ``` Scripting Best Practices Error Handling in Scripts ```bash #!/bin/bash response=$(curl -s -w "HTTPSTATUS:%{http_code}" https://api.example.com/data) http_code=$(echo $response | tr -d '\n' | sed -e 's/.*HTTPSTATUS://') body=$(echo $response | sed -e 's/HTTPSTATUS\:.*//g') if [ $http_code -eq 200 ]; then echo "Success: $body" else echo "HTTP Error: $http_code" exit 1 fi ``` Configuration Management ```bash Use .curlrc for default options echo 'user-agent = "MyScript/1.0"' >> ~/.curlrc echo 'connect-timeout = 30' >> ~/.curlrc echo 'max-time = 120' >> ~/.curlrc ``` Logging and Monitoring ```bash Comprehensive logging function log_curl_request() { local url=$1 local logfile="curl_$(date +%Y%m%d).log" curl -w "URL: %{url_effective}\nHTTP: %{http_code}\nTime: %{time_total}s\nSize: %{size_download}\n" \ -s -o response.json "$url" >> "$logfile" } ``` Performance Optimization Advanced Connection Management Connection Pooling ```bash Multiple requests using same connection curl --keepalive-time 30 \ https://api.example.com/endpoint1 \ https://api.example.com/endpoint2 \ https://api.example.com/endpoint3 ``` HTTP/2 and HTTP/3 ```bash Force HTTP/2 curl --http2-prior-knowledge https://http2.example.com Try HTTP/3 if available curl --http3 https://http3-enabled.example.com ``` Caching Strategies Conditional Requests ```bash Use If-Modified-Since header curl -H "If-Modified-Since: Wed, 21 Oct 2023 07:28:00 GMT" \ https://api.example.com/data Use ETags for cache validation curl -H "If-None-Match: \"etag-value-here\"" \ https://api.example.com/resource ``` Monitoring and Metrics Performance Metrics ```bash Comprehensive timing information curl -w "DNS: %{time_namelookup}s\nConnect: %{time_connect}s\nTLS: %{time_appconnect}s\nTransfer: %{time_starttransfer}s\nTotal: %{time_total}s\n" \ https://example.com Size and speed metrics curl -w "Downloaded: %{size_download} bytes\nSpeed: %{speed_download} bytes/sec\n" \ https://example.com/large-file ``` Conclusion curl is an incredibly versatile and powerful tool for fetching URLs and interacting with web services. Throughout this comprehensive guide, we've explored everything from basic URL fetching to advanced authentication methods, file transfers, and performance optimization techniques. Key Takeaways 1. Versatility: curl supports numerous protocols and authentication methods, making it suitable for virtually any web-related task. 2. Flexibility: The extensive range of options allows you to customize requests precisely to your needs, from simple GET requests to complex API interactions. 3. Reliability: With proper error handling, timeout configuration, and retry mechanisms, curl can be used reliably in production environments. 4. Security: When configured correctly with proper SSL verification and credential management, curl provides secure communication with web services. 5. Performance: Advanced features like HTTP/2 support, connection reuse, and compression help optimize performance for high-volume operations. Next Steps Now that you have a comprehensive understanding of curl, consider these next steps: - Practice: Experiment with different APIs and services to reinforce your learning - Automation: Integrate curl into shell scripts and automation workflows - Monitoring: Use curl for health checks and monitoring in your infrastructure - API Testing: Leverage curl for testing and debugging API endpoints - Advanced Topics: Explore libcurl for programmatic integration in applications Additional Resources - Official Documentation: Visit the curl website for the most up-to-date documentation - Community: Join curl mailing lists and forums for support and advanced discussions - Integration: Explore how curl integrates with other tools like jq for JSON processing - Alternatives: Consider complementary tools like wget, httpie, and postman for different use cases Remember that mastering curl is an ongoing process. As web technologies evolve, new features and best practices emerge. Stay updated with the latest curl releases and continue practicing with real-world scenarios to maintain and improve your skills. Whether you're downloading files, testing APIs, or automating web interactions, curl remains an essential tool in any developer's or system administrator's toolkit. With the knowledge gained from this guide, you're well-equipped to handle virtually any URL fetching task that comes your way.