CS155: Introduction to Unix

Spring 2018

Commands 2

See this page as a slide show

Text Manipulation

CS155 Commands2

Text manipulation

Vocabularyscript, wc, grep, sort, cut, uniq
Punctuation;
Grammarcommand [option]... [argument]... [redirection]

Learning More with man and info

script: Recording a session

wc: Counting Lines and Words

wc is used to get information about the contents of a file.

% cd ~/pub
% cat dwarfs.txt
Grumpy is in love
Sneezy has a red nose
Happy — plump
Doc — glasses
Bashful doesn’t end in “y”
Sleepy — somnambulent
Dopey — unbearded
% wc dwarfs.txt
  7  22 149 dwarfs.txt

That’s 7 lines, 22 words, and 149 bytes. A byte is the same as a character, more or less.

Note that a byte means a letter, a number, a space, a dot, a newline character—everything that takes up room. Spaces and newlines count!

wc : Some Options

% cd ~/pub

% wc -c dwarfs.txt
149 dwarfs.txt

% wc -l dwarfs.txt
7 dwarfs.txt

% wc -cw dwarfs.txt
 22 149 dwarfs.txt

Individual counts can be extracted with options

If multiple files are listed on the command line, then multiple counts are computed, and a total is also given. wc is useful when combined with other commands using | (the pipe symbol).

grep : looking for text in a file

grep is an extremely useful command (it’s also extremely complicated). The simplest use is to search for an exact piece of text in a list of files.

% grep 'el' ~/pub/greek
Delta
% grep "rty-t" ~/pub/numbers
thirty-two
thirty-three
forty-two
forty-three

More grepping

If you searched multiple files, then the output is filename:line of text. That way, you know which output came from which file.

% cd ~/pub
% grep "el" greek numbers
greek:Delta
numbers:eleven
numbers:twelve

The -n option causes the line number to be printed:

% cd ~/pub
% grep -n 'el' greek numbers
greek:4:Delta
numbers:11:eleven
numbers:12:twelve

Case matters

UPPER/lower-case matters:

% cd ~/pub
% grep "et" greek
Beta
Zeta
Theta
% grep "Et" greek
Eta
% grep -i "et" greek
Beta
Zeta
Eta
Theta

The -i option specifies case-independent.

grep: Simple Patterns

Some symbols can be used for searching for inexact patterns. Alas, * and ? have different meanings than their use in wildcards.

PatternMeaning in grepPatternMeaning
.Any single character^start of line
[aeiou]a single vowel$end of line
[aeiou]*a bunch of vowels\?zero or one of what came before
[a-z]a lowercase letter*zero or more of what came before

Not filename patterns

grep examples

CommandMatches
grep 'e[as]t' my_file“eat” or “west” but not “east”
grep 'b[oi]y' *html“boy” but not “BOY”
grep 'windows* ' *html“window ” and “windows ”
grep 'window.' *html“window-” and “window,” and “windows”
grep '^Jack' foo“Jack” only at the start of the line

grep examples: .

% cat ~/pub/greek
Alpha
Beta
Gamma
Delta
Epsilon
Zeta
Eta
Theta
Iota
Kappa
Lambda
Mu
Nu
Xi
Omicron
Pi
Rho
Sigma
Tau
Upsilon
Phi
Chi
Psi
Omega

A dot (.) matches any single character:

% grep "e.a" ~/pub/greek
Beta
Zeta
Theta
Omega
% grep "e..a" ~/pub/greek
Delta
% grep "......." ~/pub/greek
Epsilon
Omicron
Upsilon

grep examples: […]

% cat ~/pub/greek
Alpha
Beta
Gamma
Delta
Epsilon
Zeta
Eta
Theta
Iota
Kappa
Lambda
Mu
Nu
Xi
Omicron
Pi
Rho
Sigma
Tau
Upsilon
Phi
Chi
Psi
Omega

A character class ([aeiou]) matches any single character within it.

% grep "[JACK]" ~/pub/greek
Alpha
Kappa
Chi
% grep "[BCD]" ~/pub/greek
Beta
Delta
Chi
% grep "[B-D]" ~/pub/greek
Beta
Delta
Chi

grep examples: ^

% cat ~/pub/greek
Alpha
Beta
Gamma
Delta
Epsilon
Zeta
Eta
Theta
Iota
Kappa
Lambda
Mu
Nu
Xi
Omicron
Pi
Rho
Sigma
Tau
Upsilon
Phi
Chi
Psi
Omega

^ matches at the beginning of the line.

% grep -i "o" ~/pub/greek
Epsilon
Iota
Omicron
Rho
Upsilon
Omega
% grep -i "^o" ~/pub/greek
Omicron
Omega

grep examples: $

% cat ~/pub/greek
Alpha
Beta
Gamma
Delta
Epsilon
Zeta
Eta
Theta
Iota
Kappa
Lambda
Mu
Nu
Xi
Omicron
Pi
Rho
Sigma
Tau
Upsilon
Phi
Chi
Psi
Omega

$ matches at the end of the line.

% grep -i "u" ~/pub/greek
Mu
Nu
Tau
Upsilon
% grep -i "u$" ~/pub/greek
Mu
Nu
Tau

grep examples: \?

% cat ~/pub/greek
Alpha
Beta
Gamma
Delta
Epsilon
Zeta
Eta
Theta
Iota
Kappa
Lambda
Mu
Nu
Xi
Omicron
Pi
Rho
Sigma
Tau
Upsilon
Phi
Chi
Psi
Omega

\? matches zero or one of what came before.

% grep 'et' ~/pub/greek
Beta
Zeta
Theta
% grep 'elt' ~/pub/greek
Delta
% grep 'el\?t' ~/pub/greek
Beta
Delta
Zeta
Theta

grep examples: *

% cat ~/pub/greek
Alpha
Beta
Gamma
Delta
Epsilon
Zeta
Eta
Theta
Iota
Kappa
Lambda
Mu
Nu
Xi
Omicron
Pi
Rho
Sigma
Tau
Upsilon
Phi
Chi
Psi
Omega

* matches zero or more of what came before.

% grep 'ma' ~/pub/greek
Gamma
Sigma
% grep 'm.*a' ~/pub/greek
Gamma
Lambda
Sigma
Omega
% grep 'm[a-d]*a' ~/pub/greek
Gamma
Lambda
Sigma

sort : reorder lines of text

sort reorders lines of text lexicographically. This means they are sorted alphabetically moving from the first character to the last.

Usage: sort [OPTION]... [FILE]...

-n makes sort do a numeric sort
-r reverses the result of the comparisons
-u removes duplicates while sorting
-t'delimiter' -kpos1,pos2 can be used to sort by column

Uniqueness while sorting by column is only checked on the field(s) specified by -k.

sort examples

% cat ~/pub/dwarfs.txt
Grumpy is in love
Sneezy has a red nose
Happy — plump
Doc — glasses
Bashful doesn’t end in “y”
Sleepy — somnambulent
Dopey — unbearded
% sort ~/pub/dwarfs.txt
Bashful doesn’t end in “y”
Doc — glasses
Dopey — unbearded
Grumpy is in love
Happy — plump
Sleepy — somnambulent
Sneezy has a red nose
% sort -r ~/pub/dwarfs.txt
Sneezy has a red nose
Sleepy — somnambulent
Happy — plump
Grumpy is in love
Dopey — unbearded
Doc — glasses
Bashful doesn’t end in “y”

sort: An Example

% cat my_file2
1234
10
10000
5679
% sort my_file2
10
10000
1234
5679
% sort -n my_file2
10
1234
5679
10000
% sort -n <my_file2
(same as above)
% cat my_file2 | sort -n

cut: Selecting columns from a file

cut allows us to select columns from a file. Options:

-d "delimiter"
Specify your own delimiter, which is between the columns.
-f field-list
A comma-separated list of columns or ranges.

Examples:

     cut -d";" -f3 filename
     cut -f2,3,5,7 -d"/" filename
     grep "x" filename | cut -d"," -f1,3,5-7,9-

Field-based cut examples

% cat data
Alpha,Beta,Gamma,Delta,Epsilon,Zeta
Eta,Theta,Iota,Kappa,Lambda,Mu,Nu,Xi
Omicron,Pi,Rho,Sigma,Tau,Upsilon,Phi
Chi,Psi,Omega
% cut -d"," -f 3 data
Gamma
Iota
Rho
Omega
% cut -d"," -f 5 data
Epsilon
Lambda
Tau

% cut -d"," -f 2,4 data
Beta,Delta
Theta,Kappa
Pi,Sigma
Psi

More field-based cut examples

% cat data
Alpha,Beta,Gamma,Delta,Epsilon,Zeta
Eta,Theta,Iota,Kappa,Lambda,Mu,Nu,Xi
Omicron,Pi,Rho,Sigma,Tau,Upsilon,Phi
Chi,Psi,Omega
% cut -d"," -f 2-4 data
Beta,Gamma,Delta
Theta,Iota,Kappa
Pi,Rho,Sigma
Psi,Omega
% cut -d"," -f 1-3,5-7 data
Alpha,Beta,Gamma,Epsilon,Zeta
Eta,Theta,Iota,Lambda,Mu,Nu
Omicron,Pi,Rho,Tau,Upsilon,Phi
Chi,Psi,Omega

Character-based cut

cut can also use -c character-list to obtain ranges of characters (not fields). No delimiter is needed.

% cat data
Alpha,Beta,Gamma,Delta,Epsilon,Zeta
Eta,Theta,Iota,Kappa,Lambda,Mu,Nu,Xi
Omicron,Pi,Rho,Sigma,Tau,Upsilon,Phi
Chi,Psi,Omega
% cut -c5-12 data
a,Beta,G
Theta,Io
ron,Pi,R
Psi,Omeg
% cut -c1-3,5,7-10 data
AlpaBeta
EtaTeta,
Omirn,Pi
ChiPi,Om
% cut -c10- data
a,Gamma,Delta,Epsilon,Zeta
,Iota,Kappa,Lambda,Mu,Nu,Xi
i,Rho,Sigma,Tau,Upsilon,Phi
mega

uniq: Selecting unique lines from a file

uniq removes repeated lines from a sorted file.

uniq can also be used to print only lines that are unique to a file (with -u) or only those that are repeated (with -d).

The combination of sort, cut, and uniq is a powerful tool for text manipulation in Unix.

A More Complex Example

Consider the example file.

user1 4125142 passwd
user3 1415511 f#afk@
user2 9999999 p_2ad(
user4 1415511 m#@!ad
user5 0011292 lkdfaa

We want to find out how many unique ID numbers there are and also get a list of names and passwords sorted by user name.

Example

% cat my_file3
user1 4125142 passwd
user3 1415511 f#afk@
user2 9999999 p_2ad(
user4 1415511 m#@!ad
user5 0011292 lkdfaa
% cut -f2 -d" " my_file3
4125142
1415511
9999999
1415511
0011292
% cut -f2 -d" " my_file3 | sort -n
0011292
1415511
1415511
4125142
9999999

Example

% cut -f2 -d" " my_file3 | sort -n | uniq
0011292
1415511
4125142
9999999  
% sort my_file3
user1 4125142 passwd
user2 9999999 p_2ad(
user3 1415511 f#afk@
user4 1415511 m#@!ad
user5 0011292 lkdfaa 
% sort my_file3 | cut -f1,3 -d" "
user1 passwd
user2 p_2ad(
user3 f#afk@
user4 m#@!ad
user5 lkdfaa

User: Guest

Check: HTML CSS
Edit History Source

Modified: 2018-01-25T20:32

Apply to CSU | Contact CSU | Disclaimer | Equal Opportunity
Colorado State University, Fort Collins, CO 80523 USA
© 2018 Colorado State University
CS Building