Unix - Text Processing Tools in Unix

Text Processing Tools in Unix

Unix provides a rich set of tools to search, filter, manipulate, and organize text. These tools are often combined using pipes (|).

1. grep – Search Text

Stands for Global Regular Expression Print.
Searches for patterns in files or input and prints matching lines.

grep "error" logfile.txt        # Find lines containing "error"
grep -i "error" logfile.txt     # Case-insensitive search
grep -r "TODO" ./               # Recursively search in current directory
grep -n "error" logfile.txt     # Show line numbers

2. awk – Pattern Scanning and Processing

A powerful programming language for text processing.
Works well with columns and fields.

# Print 1st and 3rd column of a file
awk '{print $1, $3}' data.txt

# Print lines where 2nd column > 50
awk '$2 > 50' data.txt

# Using field separator (CSV)
awk -F',' '{print $1, $2}' data.csv

3. sed – Stream Editor

Used for editing text in a stream or file.
Commonly used for substitution, deletion, insertion.

# Replace "apple" with "orange" in file.txt
sed 's/apple/orange/g' file.txt

# Delete lines containing "error"
sed '/error/d' file.txt

# Print only lines 2 to 4
sed -n '2,4p' file.txt

4. sort – Sort Text

Sorts lines of text alphabetically or numerically.

sort file.txt                   # Alphabetical sort
sort -r file.txt                # Reverse sort
sort -n numbers.txt             # Numerical sort
sort -k 2 data.txt              # Sort by 2nd column

5. uniq – Remove Duplicate Lines

Filters out consecutive duplicate lines.
Often combined with sort to remove all duplicates.

uniq file.txt                   # Remove consecutive duplicates
sort file.txt | uniq            # Remove all duplicates
uniq -c file.txt                # Count occurrences of each line

6. Combining Tools

Unix tools shine when combined with pipes:

# Find lines containing "error", sort, and remove duplicates
grep "error" logfile.txt | sort | uniq

# Print 2nd column of CSV, sort, and count unique entries
awk -F',' '{print $2}' data.csv | sort | uniq -c

Summary

grep → search for patterns
awk → process columns and perform calculations
sed → edit text streams or files
sort → sort lines alphabetically or numerically
uniq → remove duplicates, optionally count occurrences