CS155

CS155: Introduction to Unix

Spring 2018

Commands 2

See this page as a slide show

Text Manipulation

CS155 Commands2

Text manipulation

Vocabularyscript, wc, grep, sort, cut, uniq
Punctuation;
Grammarcommand [option]... [argument]... [redirection]

Learning More with man and info

script: Recording a session

wc: Counting Lines and Words

wc is used to get information about the contents of a file.

% cd ~/pub
% cat dwarfs.txt
Grumpy is in love
Sneezy has a red nose
Happy — plump
Doc — glasses
Bashful doesn’t end in “y”
Sleepy — somnambulent
Dopey — unbearded
% wc dwarfs.txt
  7  22 149 dwarfs.txt

That’s 7 lines, 22 words, and 149 bytes. A byte is the same as a character, more or less.

Note that a byte means a letter, a number, a space, a dot, a newline character—everything that takes up room. Spaces and newlines count!

wc : Some Options

% cd ~/pub

% wc -c dwarfs.txt
149 dwarfs.txt

% wc -l dwarfs.txt
7 dwarfs.txt

% wc -cw dwarfs.txt
 22 149 dwarfs.txt

Individual counts can be extracted with options

If multiple files are listed on the command line, then multiple counts are computed, and a total is also given. wc is useful when combined with other commands using | (the pipe symbol).

grep : looking for text in a file

grep is an extremely useful command (it’s also extremely complicated). The simplest use is to search for an exact piece of text in a list of files.

% grep 'el' ~/pub/greek
Delta
% grep "rty-t" ~/pub/numbers
thirty-two
thirty-three
forty-two
forty-three

More grepping

If you searched multiple files, then the output is filename:line of text. That way, you know which output came from which file.

% cd ~/pub
% grep "el" greek numbers
greek:Delta
numbers:eleven
numbers:twelve

The -n option causes the line number to be printed:

% cd ~/pub
% grep -n 'el' greek numbers
greek:4:Delta
numbers:11:eleven
numbers:12:twelve

grep: Simple Patterns

Some symbols can be used for searching for inexact patterns. Alas, * and ? have different meanings than their use in wildcards.

PatternMeaning in grepPatternMeaning
.Any single character^start of line
[aeiou]a single vowel$end of line
[aeiou]*a bunch of vowels\?zero or one of what came before
[a-z]a lowercase letter*zero or more of what came before

Not filename patterns

grep examples

CommandMatches
grep 'e[as]t' my_file“eat” or “west” but not “east”
grep 'b[oi]y' *html“boy” but not “BOY”
grep 'windows* ' *html“window ” and “windows ”
grep 'window.' *html“window-” and “window,” and “windows”
grep '^Jack' foo“Jack” only at the start of the line

sort : reorder lines of text

sort reorders lines of text lexicographically. This means they are sorted alphabetically moving from the first character to the last.

Usage: sort [OPTION]... [FILE]...

-n makes sort do a numeric sort
-r reverses the result of the comparisons
-u removes duplicates while sorting
-t'delimiter' -kpos1,pos2 can be used to sort by column

Uniqueness while sorting by column is only checked on the field(s) specified by -k.

sort: An Example

% cat my_file2
1234
10
10000
5679
% sort my_file2
10
10000
1234
5679
% sort -n my_file2
10
1234
5679
10000
% sort -n <my_file2
(same as above)
% cat my_file2 | sort -n

cut: Selecting columns from a file

cut allows us to select columns from a file. Options:

-d "delimiter"
Specify your own delimiter, which is between the columns.
-f field-list
A comma-separated list of columns or ranges.

Examples:

     cut -d";" -f3 filename
     cut -f2,3,5,7 -d"/" filename
     grep "x" filename | cut -d"," -f1,3,5-7,9-

Field-based cut examples

% cat data
Alpha,Beta,Gamma,Delta,Epsilon,Zeta
Eta,Theta,Iota,Kappa,Lambda,Mu,Nu,Xi
Omicron,Pi,Rho,Sigma,Tau,Upsilon,Phi
Chi,Psi,Omega
% cut -d"," -f 3 data
Gamma
Iota
Rho
Omega
% cut -d"," -f 5 data
Epsilon
Lambda
Tau

% cut -d"," -f 2,4 data
Beta,Delta
Theta,Kappa
Pi,Sigma
Psi

More field-based cut examples

% cat data
Alpha,Beta,Gamma,Delta,Epsilon,Zeta
Eta,Theta,Iota,Kappa,Lambda,Mu,Nu,Xi
Omicron,Pi,Rho,Sigma,Tau,Upsilon,Phi
Chi,Psi,Omega
% cut -d"," -f 2-4 data
Beta,Gamma,Delta
Theta,Iota,Kappa
Pi,Rho,Sigma
Psi,Omega
% cut -d"," -f 1-3,5-7 data
Alpha,Beta,Gamma,Epsilon,Zeta
Eta,Theta,Iota,Lambda,Mu,Nu
Omicron,Pi,Rho,Tau,Upsilon,Phi
Chi,Psi,Omega

Character-based cut

cut can also use -c character-list to obtain ranges of characters (not fields). No delimiter is needed.

% cat data
Alpha,Beta,Gamma,Delta,Epsilon,Zeta
Eta,Theta,Iota,Kappa,Lambda,Mu,Nu,Xi
Omicron,Pi,Rho,Sigma,Tau,Upsilon,Phi
Chi,Psi,Omega
% cut -c5-12 data
a,Beta,G
Theta,Io
ron,Pi,R
Psi,Omeg
% cut -c1-3,5,7-10 data
AlpaBeta
EtaTeta,
Omirn,Pi
ChiPi,Om
% cut -c10- data
a,Gamma,Delta,Epsilon,Zeta
,Iota,Kappa,Lambda,Mu,Nu,Xi
i,Rho,Sigma,Tau,Upsilon,Phi
mega

uniq: Selecting unique lines from a file

uniq removes repeated lines from a sorted file.

uniq can also be used to print only lines that are unique to a file (with -u) or only those that are repeated (with -d).

The combination of sort, cut, and uniq is a powerful tool for text manipulation in Unix.

A More Complex Example

Consider the example file.

user1 4125142 passwd
user3 1415511 f#afk@
user2 9999999 p_2ad(
user4 1415511 m#@!ad
user5 0011292 lkdfaa

We want to find out how many unique ID numbers there are and also get a list of names and passwords sorted by user name.

Example

% cat my_file3
user1 4125142 passwd
user3 1415511 f#afk@
user2 9999999 p_2ad(
user4 1415511 m#@!ad
user5 0011292 lkdfaa
% cut -f2 -d" " my_file3
4125142
1415511
9999999
1415511
0011292
% cut -f2 -d" " my_file3 | sort -n
0011292
1415511
1415511
4125142
9999999

Example

% cut -f2 -d" " my_file3 | sort -n | uniq
0011292
1415511
4125142
9999999  
% sort my_file3
user1 4125142 passwd
user2 9999999 p_2ad(
user3 1415511 f#afk@
user4 1415511 m#@!ad
user5 0011292 lkdfaa 
% sort my_file3 | cut -f1,3 -d" "
user1 passwd
user2 p_2ad(
user3 f#afk@
user4 m#@!ad
user5 lkdfaa

Modified: 2018-01-08T18:45

User: Guest

Check: HTML CSS
Edit History Source
Apply to CSU | Contact CSU | Disclaimer | Equal Opportunity
Colorado State University, Fort Collins, CO 80523 USA
© 2015 Colorado State University
CS Building