wget
Overview
The wget
command is a non-interactive network downloader that retrieves files from web servers using HTTP, HTTPS, and FTP protocols. It’s designed for robust downloading with retry capabilities.
Syntax
wget [options] [URL...]
Common Options
Option | Description |
---|---|
-O file |
Output to file |
-c |
Continue partial download |
-r |
Recursive download |
-np |
No parent directories |
-k |
Convert links for local viewing |
-p |
Download page requisites |
-m |
Mirror website |
-q |
Quiet mode |
-v |
Verbose output |
-t n |
Retry n times |
-T n |
Timeout in seconds |
--limit-rate=rate |
Limit download speed |
Download Types
Type | Description |
---|---|
Single file | Download one file |
Recursive | Download directory structure |
Mirror | Complete website copy |
Resume | Continue interrupted download |
Batch | Multiple URLs from file |
Key Use Cases
- Download files from web
- Mirror websites
- Automated downloads
- Backup web content
- Batch file retrieval
Examples with Explanations
Example 1: Basic Download
wget https://example.com/file.zip
Downloads file to current directory
Example 2: Save with Different Name
wget -O myfile.zip https://example.com/file.zip
Downloads and saves with specified name
Example 3: Resume Download
wget -c https://example.com/largefile.iso
Continues interrupted download
Recursive Downloads
Download website:
wget -r -np -k https://example.com/
Mirror with limits:
wget -m -l 2 https://example.com/
Download directory:
wget -r -np https://example.com/files/
Advanced Options
Option | Description |
---|---|
--user-agent=agent |
Set user agent |
--referer=url |
Set referer |
--header=header |
Add HTTP header |
--post-data=data |
POST request |
--cookies=on/off |
Handle cookies |
--no-check-certificate |
Skip SSL verification |
--spider |
Check if file exists |
Common Usage Patterns
Download with rate limit:
wget --limit-rate=200k https://example.com/file.zip
Background download:
wget -b https://example.com/largefile.iso
Download from file list:
wget -i urls.txt
Authentication
Basic auth:
wget --user=username --password=password URL
Certificate auth:
wget --certificate=cert.pem --private-key=key.pem URL
Cookie authentication:
wget --load-cookies=cookies.txt URL
Performance Analysis
- Efficient for large files
- Good retry mechanisms
- Bandwidth limiting available
- Parallel downloads possible
- Resume capability reduces waste
Additional Resources
Best Practices
- Use appropriate retry settings
- Respect robots.txt
- Limit download rate for courtesy
- Use resume for large files
- Verify downloaded files
Website Mirroring
Complete mirror:
wget -m -p -E -k -K -np https://example.com/
Limited depth:
wget -r -l 3 -k -p https://example.com/
Specific file types:
wget -r -A "*.pdf,*.doc" https://example.com/
Security Considerations
- Verify SSL certificates
- Be cautious with –no-check-certificate
- Validate downloaded content
- Use secure protocols when possible
- Check file integrity
Troubleshooting
- SSL certificate errors
- Connection timeouts
- Server blocking requests
- Disk space issues
- Permission problems
Integration Examples
With cron for scheduled downloads:
0 2 * * * wget -q -O /backup/file.zip https://example.com/file.zip
With find for cleanup:
wget https://example.com/file.zip && find . -name "*.tmp" -delete
Batch processing:
for url in $(cat urls.txt); do wget "$url"; done