uniq
Overview
The uniq
command filters out repeated lines in a file or input stream. It works on adjacent duplicate lines, so input is typically sorted first.
Syntax
uniq [options] [input [output]]
Common Options
Option | Description |
---|---|
-c |
Count occurrences |
-d |
Show duplicates only |
-u |
Show unique lines only |
-i |
Ignore case |
-f n |
Skip first n fields |
-s n |
Skip first n characters |
-w n |
Compare first n characters |
--group |
Group adjacent lines |
Key Use Cases
- Remove duplicate lines
- Count line occurrences
- Find unique entries
- Data deduplication
- Log analysis
Examples with Explanations
Example 1: Remove Duplicates
sort file.txt | uniq
Removes adjacent duplicate lines
Example 2: Count Occurrences
sort file.txt | uniq -c
Shows count of each unique line
Example 3: Show Only Duplicates
sort file.txt | uniq -d
Shows only lines that appear multiple times
Understanding Behavior
Important notes: - Only removes adjacent duplicates - Usually used with sort
first - Case-sensitive by default - Compares entire lines unless specified
Common Usage Patterns
Deduplicate sorted data:
sort data.txt | uniq > clean.txt
Find most common entries:
sort file.txt | uniq -c | sort -nr
Case-insensitive deduplication:
sort file.txt | uniq -i
Field-Based Operations
Skip fields:
uniq -f 2 file.txt
Skip characters:
uniq -s 5 file.txt
Compare specific width:
uniq -w 10 file.txt
Advanced Usage
Group similar lines:
sort file.txt | uniq --group
Show unique only:
sort file.txt | uniq -u
Complex counting:
sort file.txt | uniq -c | awk '$1 > 5'
Performance Analysis
- Very fast operation
- Memory usage minimal
- Works well with large files
- Streaming operation (doesn’t load entire file)
- Efficient for pipeline processing
Additional Resources
Best Practices
- Always sort input first
- Use with other text processing tools
- Consider case sensitivity needs
- Test field/character skipping carefully
- Use counting for analysis
Common Patterns
Top 10 most frequent:
sort file.txt | uniq -c | sort -nr | head -10
Find unique IPs in log:
awk '{print $1}' access.log | sort | uniq
Remove blank line duplicates:
sort file.txt | uniq | grep -v '^$'
Integration Examples
With grep:
grep "pattern" *.log | sort | uniq -c
With cut:
cut -d',' -f1 data.csv | sort | uniq
Log analysis:
tail -f access.log | sort | uniq -c
Troubleshooting
- Duplicates not removed (need sort first)
- Case sensitivity issues
- Field counting problems
- Character encoding issues
- Large file processing