How to use wget to download files
How to Use wget to Download Files
Table of Contents
1. [Introduction](#introduction)
2. [Prerequisites](#prerequisites)
3. [Basic wget Syntax](#basic-wget-syntax)
4. [Installation Guide](#installation-guide)
5. [Basic File Downloads](#basic-file-downloads)
6. [Advanced Download Options](#advanced-download-options)
7. [Downloading Multiple Files](#downloading-multiple-files)
8. [Website Mirroring and Recursive Downloads](#website-mirroring-and-recursive-downloads)
9. [Authentication and Cookies](#authentication-and-cookies)
10. [Network Configuration](#network-configuration)
11. [Common Use Cases](#common-use-cases)
12. [Troubleshooting](#troubleshooting)
13. [Best Practices](#best-practices)
14. [Conclusion](#conclusion)
Introduction
wget (short for "web get") is a powerful, free command-line utility for downloading files from web servers. Originally developed for Unix-like systems, wget has become an essential tool for system administrators, developers, and power users who need to download files efficiently from the internet. This comprehensive guide will teach you everything you need to know about using wget, from basic file downloads to advanced website mirroring techniques.
Whether you're downloading a single file, creating backups of websites, or automating download processes in scripts, wget provides the flexibility and reliability you need. Unlike web browsers, wget operates entirely from the command line and can handle interrupted downloads, follow redirects, and work with various authentication methods.
Prerequisites
Before diving into wget usage, ensure you have:
- Operating System: Linux, macOS, or Windows with WSL/Cygwin
- Command Line Access: Terminal or command prompt
- Internet Connection: Active network connection
- Basic Command Line Knowledge: Understanding of basic terminal commands
- Permissions: Appropriate write permissions for download directories
System Requirements
- Memory: Minimal RAM requirements (typically under 50MB)
- Storage: Sufficient disk space for downloaded files
- Network: Stable internet connection for reliable downloads
Basic wget Syntax
The fundamental syntax of wget follows this pattern:
```bash
wget [OPTIONS] [URL]
```
Essential Components
- wget: The command itself
- OPTIONS: Flags that modify wget's behavior
- URL: The web address of the file or resource to download
Simple Example
```bash
wget https://example.com/file.zip
```
This basic command downloads `file.zip` from the specified URL to the current directory.
Installation Guide
Linux Systems
Most Linux distributions include wget by default. If not installed:
Ubuntu/Debian:
```bash
sudo apt update
sudo apt install wget
```
CentOS/RHEL/Fedora:
```bash
sudo yum install wget
or for newer versions
sudo dnf install wget
```
Arch Linux:
```bash
sudo pacman -S wget
```
macOS
Using Homebrew:
```bash
brew install wget
```
Using MacPorts:
```bash
sudo port install wget
```
Windows
Windows Subsystem for Linux (WSL):
```bash
sudo apt install wget
```
Git Bash or Cygwin:
Download and install through their respective package managers.
Verification
Confirm installation by checking the version:
```bash
wget --version
```
Basic File Downloads
Single File Download
The simplest wget operation downloads a single file:
```bash
wget https://releases.ubuntu.com/20.04/ubuntu-20.04.3-desktop-amd64.iso
```
This command:
- Downloads the Ubuntu ISO file
- Saves it in the current directory
- Uses the original filename
- Shows download progress
Specifying Output Filename
Use the `-O` (capital O) option to specify a custom filename:
```bash
wget -O ubuntu-desktop.iso https://releases.ubuntu.com/20.04/ubuntu-20.04.3-desktop-amd64.iso
```
Downloading to Specific Directory
Use the `-P` option to specify the download directory:
```bash
wget -P /home/user/downloads/ https://example.com/file.pdf
```
Background Downloads
For large files, run wget in the background:
```bash
wget -b https://example.com/largefile.zip
```
The download progress is logged to `wget-log` file.
Advanced Download Options
Resume Interrupted Downloads
Use the `-c` (continue) option to resume partial downloads:
```bash
wget -c https://example.com/largefile.zip
```
This is particularly useful for large files or unstable connections.
Limiting Download Speed
Control bandwidth usage with the `--limit-rate` option:
```bash
wget --limit-rate=200k https://example.com/file.zip
```
Common rate formats:
- `200k` - 200 kilobytes per second
- `1m` - 1 megabyte per second
- `500` - 500 bytes per second
Timeout Settings
Configure timeout values for better reliability:
```bash
wget --timeout=30 --tries=3 https://example.com/file.pdf
```
Options explained:
- `--timeout=30`: Wait 30 seconds for response
- `--tries=3`: Attempt download 3 times before giving up
User Agent Modification
Some servers block or restrict access based on user agents:
```bash
wget --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36" https://example.com/file.zip
```
Quiet and Verbose Modes
Quiet mode (suppress output):
```bash
wget -q https://example.com/file.pdf
```
Verbose mode (detailed output):
```bash
wget -v https://example.com/file.pdf
```
Downloading Multiple Files
From Text File List
Create a text file with URLs (one per line):
```bash
urls.txt
https://example.com/file1.pdf
https://example.com/file2.zip
https://example.com/file3.tar.gz
```
Download all files:
```bash
wget -i urls.txt
```
Wildcards and Patterns
Download multiple files using patterns:
```bash
wget https://example.com/files/document{1..10}.pdf
```
This downloads `document1.pdf` through `document10.pdf`.
Sequential Downloads
For numbered files:
```bash
for i in {1..5}; do
wget https://example.com/file$i.zip
done
```
Website Mirroring and Recursive Downloads
Basic Website Mirroring
Mirror an entire website:
```bash
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent https://example.com
```
Options explained:
- `--mirror`: Enable mirroring options
- `--convert-links`: Convert links for local viewing
- `--adjust-extension`: Add appropriate extensions
- `--page-requisites`: Download CSS, images, etc.
- `--no-parent`: Don't ascend to parent directory
Recursive Download with Depth Limit
Control recursion depth:
```bash
wget -r -l 2 https://example.com/documentation/
```
- `-r`: Recursive download
- `-l 2`: Limit recursion to 2 levels deep
Filtering File Types
Download only specific file types:
```bash
wget -r -A ".pdf,.doc,*.txt" https://example.com/documents/
```
Reject specific file types:
```bash
wget -r -R ".gif,.jpg,.jpeg,.png" https://example.com/
```
Domain Restrictions
Stay within specific domains:
```bash
wget -r --domains=example.com,subdomain.example.com https://example.com/
```
Authentication and Cookies
HTTP Authentication
Basic Authentication:
```bash
wget --http-user=username --http-password=password https://example.com/protected/file.zip
```
Prompt for Password:
```bash
wget --http-user=username --ask-password https://example.com/protected/file.zip
```
FTP Authentication
```bash
wget --ftp-user=username --ftp-password=password ftp://ftp.example.com/file.zip
```
Using Cookies
Save cookies:
```bash
wget --save-cookies cookies.txt --keep-session-cookies https://example.com/login
```
Load cookies:
```bash
wget --load-cookies cookies.txt https://example.com/protected/file.zip
```
Certificate Handling
Ignore SSL certificate errors (use cautiously):
```bash
wget --no-check-certificate https://example.com/file.zip
```
Specify CA certificate:
```bash
wget --ca-certificate=mycert.pem https://example.com/file.zip
```
Network Configuration
Proxy Settings
HTTP Proxy:
```bash
wget --proxy=on --http-proxy=proxy.example.com:8080 https://example.com/file.zip
```
SOCKS Proxy:
```bash
wget --proxy=on --https-proxy=socks5://proxy.example.com:1080 https://example.com/file.zip
```
IPv4/IPv6 Preferences
Force IPv4:
```bash
wget -4 https://example.com/file.zip
```
Force IPv6:
```bash
wget -6 https://example.com/file.zip
```
Connection Settings
Multiple connections:
```bash
wget --max-redirect=5 --retry-connrefused https://example.com/file.zip
```
Common Use Cases
1. Downloading Software Releases
```bash
#!/bin/bash
Download latest software release
VERSION="3.2.1"
wget -O software-${VERSION}.tar.gz \
"https://github.com/project/releases/download/v${VERSION}/software-${VERSION}.tar.gz"
```
2. Website Backup
```bash
#!/bin/bash
Complete website backup
SITE="example.com"
DATE=$(date +%Y%m%d)
mkdir -p backups/${DATE}
cd backups/${DATE}
wget --mirror \
--convert-links \
--adjust-extension \
--page-requisites \
--no-parent \
--directory-prefix=${SITE} \
https://${SITE}
```
3. Downloading Documentation
```bash
Download all PDF documentation
wget -r -l1 -A "*.pdf" -np https://example.com/docs/
```
4. API Data Retrieval
```bash
Download JSON data from API
wget --header="Authorization: Bearer TOKEN" \
--header="Content-Type: application/json" \
-O data.json \
"https://api.example.com/data"
```
5. Batch Image Downloads
```bash
#!/bin/bash
Download images from a list
while IFS= read -r url; do
filename=$(basename "$url")
wget -O "images/$filename" "$url"
sleep 1 # Be respectful to the server
done < image_urls.txt
```
Troubleshooting
Common Error Messages
"Connection refused"
```bash
Solution: Check URL, network connectivity, or use proxy
wget --retry-connrefused --waitretry=1 --read-timeout=20 --timeout=15 -t 0 URL
```
"Certificate verification failed"
```bash
Temporary solution (use with caution)
wget --no-check-certificate URL
Better solution: Update certificates
sudo apt update && sudo apt upgrade ca-certificates
```
"403 Forbidden"
```bash
Try different user agent
wget --user-agent="Mozilla/5.0 (compatible; Googlebot/2.1)" URL
```
"File already exists"
```bash
Overwrite existing files
wget --overwrite URL
Or create numbered backups
wget --backup-converted URL
```
Network Issues
Slow downloads:
```bash
Use multiple attempts and adjust timeouts
wget --tries=3 --timeout=30 --read-timeout=60 URL
```
Unstable connections:
```bash
Enable continue and increase retries
wget -c --tries=0 --retry-connrefused URL
```
Permission Problems
Cannot write to directory:
```bash
Check permissions
ls -la /path/to/directory
Change to writable directory
cd ~/Downloads
wget URL
```
Debug Mode
Enable debug output for troubleshooting:
```bash
wget --debug URL
```
Best Practices
1. Be Respectful to Servers
Add delays between requests:
```bash
wget --wait=2 --random-wait -r URL
```
Limit connection rate:
```bash
wget --limit-rate=100k URL
```
2. Use Appropriate User Agents
Don't impersonate browsers unnecessarily, but use descriptive user agents:
```bash
wget --user-agent="MyScript/1.0 (contact@example.com)" URL
```
3. Handle Errors Gracefully
In scripts, check exit codes:
```bash
#!/bin/bash
if wget -q URL; then
echo "Download successful"
else
echo "Download failed with exit code $?"
exit 1
fi
```
4. Organize Downloads
Create directory structures:
```bash
DATE=$(date +%Y-%m-%d)
mkdir -p downloads/$DATE
wget -P downloads/$DATE URL
```
5. Log Activities
Keep download logs:
```bash
wget -o download.log -b URL
```
6. Security Considerations
Verify checksums when available:
```bash
wget https://example.com/file.zip
wget https://example.com/file.zip.sha256
sha256sum -c file.zip.sha256
```
Use HTTPS when possible:
```bash
Prefer HTTPS over HTTP
wget https://example.com/file.zip
```
7. Configuration Files
Create `~/.wgetrc` for default settings:
```bash
~/.wgetrc
timeout = 30
tries = 3
continue = on
user_agent = MyWget/1.0
```
8. Monitoring Large Downloads
Use progress indicators:
```bash
wget --progress=bar:force:noscroll URL
```
For scripts, use dot progress:
```bash
wget --progress=dot:giga URL
```
9. Bandwidth Management
During business hours, limit bandwidth:
```bash
#!/bin/bash
HOUR=$(date +%H)
if [ $HOUR -ge 9 ] && [ $HOUR -le 17 ]; then
RATE="--limit-rate=100k"
else
RATE=""
fi
wget $RATE URL
```
10. Error Recovery
Implement retry logic:
```bash
#!/bin/bash
MAX_ATTEMPTS=3
ATTEMPT=1
while [ $ATTEMPT -le $MAX_ATTEMPTS ]; do
if wget -c URL; then
echo "Download successful on attempt $ATTEMPT"
break
else
echo "Attempt $ATTEMPT failed"
ATTEMPT=$((ATTEMPT + 1))
sleep 5
fi
done
if [ $ATTEMPT -gt $MAX_ATTEMPTS ]; then
echo "Download failed after $MAX_ATTEMPTS attempts"
exit 1
fi
```
Advanced Tips and Tricks
1. Custom Headers
Send custom HTTP headers:
```bash
wget --header="X-API-Key: your-api-key" \
--header="Accept: application/json" \
URL
```
2. POST Requests
Send POST data:
```bash
wget --post-data="param1=value1¶m2=value2" \
--header="Content-Type: application/x-www-form-urlencoded" \
URL
```
3. Following Redirects
Control redirect behavior:
```bash
wget --max-redirect=10 URL
```
4. Timestamping
Only download if remote file is newer:
```bash
wget -N URL
```
5. Spider Mode
Check links without downloading:
```bash
wget --spider URL
```
Performance Optimization
1. Concurrent Downloads
Use GNU parallel for multiple simultaneous downloads:
```bash
parallel -j 4 wget {} :::: urls.txt
```
2. Memory Usage
For very large files, adjust buffer sizes:
```bash
wget --buffer-size=8192 URL
```
3. DNS Caching
For multiple downloads from same domain:
```bash
wget --dns-cache=on URL1 URL2 URL3
```
Conclusion
wget is an incredibly versatile and powerful tool for downloading files from the internet. From simple single-file downloads to complex website mirroring operations, wget provides the functionality needed for virtually any download scenario. This comprehensive guide has covered everything from basic usage to advanced techniques, troubleshooting, and best practices.
Key takeaways from this guide:
1. Start Simple: Begin with basic downloads and gradually incorporate advanced options as needed
2. Be Respectful: Always consider server load and implement appropriate delays and rate limiting
3. Handle Errors: Implement proper error handling and retry logic in automated scripts
4. Security First: Use HTTPS when possible and verify file integrity when checksums are available
5. Organize Efficiently: Structure your downloads and maintain proper logging for future reference
Whether you're a system administrator automating backups, a developer downloading dependencies, or a researcher gathering data, mastering wget will significantly improve your productivity and reliability when working with web-based resources.
Remember to always respect robots.txt files, terms of service, and server resources when using wget for automated downloads. With great power comes great responsibility, and wget certainly provides great power for file downloading and web scraping tasks.
Continue exploring wget's extensive documentation with `man wget` or `wget --help` to discover even more options and capabilities that can be tailored to your specific use cases.