Unix - Text Processing Tools in Unix
Text Processing Tools in Unix
Unix provides a rich set of tools to search, filter, manipulate, and organize text. These tools are often combined using pipes (|
).
1. grep – Search Text
-
Stands for Global Regular Expression Print.
-
Searches for patterns in files or input and prints matching lines.
grep "error" logfile.txt # Find lines containing "error"
grep -i "error" logfile.txt # Case-insensitive search
grep -r "TODO" ./ # Recursively search in current directory
grep -n "error" logfile.txt # Show line numbers
2. awk – Pattern Scanning and Processing
-
A powerful programming language for text processing.
-
Works well with columns and fields.
# Print 1st and 3rd column of a file
awk '{print $1, $3}' data.txt
# Print lines where 2nd column > 50
awk '$2 > 50' data.txt
# Using field separator (CSV)
awk -F',' '{print $1, $2}' data.csv
3. sed – Stream Editor
-
Used for editing text in a stream or file.
-
Commonly used for substitution, deletion, insertion.
# Replace "apple" with "orange" in file.txt
sed 's/apple/orange/g' file.txt
# Delete lines containing "error"
sed '/error/d' file.txt
# Print only lines 2 to 4
sed -n '2,4p' file.txt
4. sort – Sort Text
-
Sorts lines of text alphabetically or numerically.
sort file.txt # Alphabetical sort
sort -r file.txt # Reverse sort
sort -n numbers.txt # Numerical sort
sort -k 2 data.txt # Sort by 2nd column
5. uniq – Remove Duplicate Lines
-
Filters out consecutive duplicate lines.
-
Often combined with
sort
to remove all duplicates.
uniq file.txt # Remove consecutive duplicates
sort file.txt | uniq # Remove all duplicates
uniq -c file.txt # Count occurrences of each line
6. Combining Tools
Unix tools shine when combined with pipes:
# Find lines containing "error", sort, and remove duplicates
grep "error" logfile.txt | sort | uniq
# Print 2nd column of CSV, sort, and count unique entries
awk -F',' '{print $2}' data.csv | sort | uniq -c
Summary
-
grep → search for patterns
-
awk → process columns and perform calculations
-
sed → edit text streams or files
-
sort → sort lines alphabetically or numerically
-
uniq → remove duplicates, optionally count occurrences