uniq

Overview

The uniq command filters out repeated lines in a file or input stream. It works on adjacent duplicate lines, so input is typically sorted first.

Syntax

uniq [options] [input [output]]

Common Options

Option	Description
`-c`	Count occurrences
`-d`	Show duplicates only
`-u`	Show unique lines only
`-i`	Ignore case
`-f n`	Skip first n fields
`-s n`	Skip first n characters
`-w n`	Compare first n characters
`--group`	Group adjacent lines

Key Use Cases

Remove duplicate lines
Count line occurrences
Find unique entries
Data deduplication
Log analysis

Examples with Explanations

Example 1: Remove Duplicates

sort file.txt | uniq

Removes adjacent duplicate lines

Example 2: Count Occurrences

sort file.txt | uniq -c

Shows count of each unique line

Example 3: Show Only Duplicates

sort file.txt | uniq -d

Shows only lines that appear multiple times

Understanding Behavior

Important notes: - Only removes adjacent duplicates - Usually used with sort first - Case-sensitive by default - Compares entire lines unless specified

Common Usage Patterns

Deduplicate sorted data:
```
sort data.txt | uniq > clean.txt
```
Find most common entries:
```
sort file.txt | uniq -c | sort -nr
```
Case-insensitive deduplication:
```
sort file.txt | uniq -i
```

Field-Based Operations

Skip fields:
```
uniq -f 2 file.txt
```
Skip characters:
```
uniq -s 5 file.txt
```
Compare specific width:
```
uniq -w 10 file.txt
```

Advanced Usage

Group similar lines:
```
sort file.txt | uniq --group
```
Show unique only:
```
sort file.txt | uniq -u
```
Complex counting:
```
sort file.txt | uniq -c | awk '$1 > 5'
```

Performance Analysis

Very fast operation
Memory usage minimal
Works well with large files
Streaming operation (doesn’t load entire file)
Efficient for pipeline processing

Additional Resources

Best Practices

Always sort input first
Use with other text processing tools
Consider case sensitivity needs
Test field/character skipping carefully
Use counting for analysis

Common Patterns

Top 10 most frequent:

sort file.txt | uniq -c | sort -nr | head -10

Find unique IPs in log:

awk '{print $1}' access.log | sort | uniq

Remove blank line duplicates:
```
sort file.txt | uniq | grep -v '^$'
```

Integration Examples

With grep:
```
grep "pattern" *.log | sort | uniq -c
```
With cut:
```
cut -d',' -f1 data.csv | sort | uniq
```
Log analysis:
```
tail -f access.log | sort | uniq -c
```

Troubleshooting

Duplicates not removed (need sort first)
Case sensitivity issues
Field counting problems
Character encoding issues
Large file processing