CS155

CS155: Introduction to Unix

Fall 2017

Commands 2

See this page as a slide show

Text Manipulation

CS155 Commands2

Text manipulation

Vocabularyscript, wc, grep, sort, cut, uniq
Punctuation;
Grammarcommand [option]... [argument]... [redirection]

Learning More with man and info

script: Recording a session

wc: Counting Lines and Words

wc is used to get information about the contents of a file.

% cat my_file
user1, old_user
user3
user4
user5 user6
% wc my_file
4 6 40 my_file

That’s 4 lines, 6 words, and 40 bytes. A byte is the same as a character, for now.

Note that a byte means a letter, a number, a space, a dot, a newline character—everything that takes up room.

wc : Some Options

Individual counts can be extracted with options

If multiple files are listed on the command line, then multiple counts are computed, and a total is also given. wc is useful when combined with other commands using | (the pipe symbol).

grep : looking for text in a file

grep is an extremely useful command (it’s also extremely complicated). The simplest use is to search for an exact piece of text in a list of files.

% grep "user1" *
my_file:user1, old_user

The output is filename:line of text. That way, you know which output came from which file.

The -n option causes the line number to be printed with the file name and line.

grep: Simple Patterns

Some symbols can be used for searching for inexact patterns. Alas, * and ? have different meanings than their use in wildcards.

PatternMeaning in grepPatternMeaning
.Any single character^start of line
[aeiou]a single vowel$end of line
[aeiou]*a bunch of vowels\?zero or one of what came before
[a-z]a lowercase letter*zero or more of what came before

grep examples

CommandMatches
grep 'e[as]t' my_file“eat” or “west” but not “east”
grep 'b[oi]y' *html“boy” but not “BOY”
grep 'windows* ' *html“window ” and “windows ”
grep 'window.' *html“window-” and “window,” and “windows”
grep '^Jack' foo“Jack” only at the start of the line

sort : reorder lines of text

sort reorders lines of text lexicographically. This means they are sorted alphabetically moving from the first character to the last.

Usage: sort [OPTION]... [FILE]...

-n makes sort do a numeric sort
-r reverses the result of the comparisons
-u removes duplicates while sorting
-t'delimiter' -kpos1,pos2 can be used to sort by column

Uniqueness while sorting by column is only checked on the field(s) specified by -k.

sort: An Example

% cat my_file2
1234
10
10000
5679
% sort my_file2
10
10000
1234
5679
% sort -n my_file2
10
1234
5679
10000
% sort -n <my_file2
(same as above)
% cat my_file2 | sort -n

cut: Selecting columns from a file

cut allows us to select columns from a file. Options:

-d "delimiter"
Specify your own delimiter, which is between the columns.
-f field-list
A comma-separated list of columns or ranges.

Examples:

     cut -d";" -f3 filename
     cut -f2,3,5,7 -d"/" filename
     grep "x" filename | cut -d"," -f1,3,5-7,9-

Field-based cut examples

% cat data
Alpha,Beta,Gamma,Delta,Epsilon,Zeta
Eta,Theta,Iota,Kappa,Lambda,Mu,Nu,Xi
Omicron,Pi,Rho,Sigma,Tau,Upsilon,Phi
Chi,Psi,Omega
% cut -d"," -f 3 data
Gamma
Iota
Rho
Omega
% cut -d"," -f 5 data
Epsilon
Lambda
Tau

% cut -d"," -f 2,4 data
Beta,Delta
Theta,Kappa
Pi,Sigma
Psi

More field-based cut examples

% cat data
Alpha,Beta,Gamma,Delta,Epsilon,Zeta
Eta,Theta,Iota,Kappa,Lambda,Mu,Nu,Xi
Omicron,Pi,Rho,Sigma,Tau,Upsilon,Phi
Chi,Psi,Omega
% cut -d"," -f 2-4 data
Beta,Gamma,Delta
Theta,Iota,Kappa
Pi,Rho,Sigma
Psi,Omega
% cut -d"," -f 1-3,5-7 data
Alpha,Beta,Gamma,Epsilon,Zeta
Eta,Theta,Iota,Lambda,Mu,Nu
Omicron,Pi,Rho,Tau,Upsilon,Phi
Chi,Psi,Omega

Character-based cut

cut can also use -c character-list to obtain ranges of characters (not fields). No delimiter is needed.

% cat data
Alpha,Beta,Gamma,Delta,Epsilon,Zeta
Eta,Theta,Iota,Kappa,Lambda,Mu,Nu,Xi
Omicron,Pi,Rho,Sigma,Tau,Upsilon,Phi
Chi,Psi,Omega
% cut -c5-12 data
a,Beta,G
Theta,Io
ron,Pi,R
Psi,Omeg
% cut -c1-3,5,7-10 data
AlpaBeta
EtaTeta,
Omirn,Pi
ChiPi,Om
% cut -c10- data
a,Gamma,Delta,Epsilon,Zeta
,Iota,Kappa,Lambda,Mu,Nu,Xi
i,Rho,Sigma,Tau,Upsilon,Phi
mega

uniq: Selecting unique lines from a file

uniq removes repeated lines from a sorted file.

uniq can also be used to print only lines that are unique to a file (with -u) or only those that are repeated (with -d).

The combination of sort, cut, and uniq is a powerful tool for text manipulation in Unix.

A More Complex Example

Consider the example file.

user1 4125142 passwd
user3 1415511 f#afk@
user2 9999999 p_2ad(
user4 1415511 m#@!ad
user5 0011292 lkdfaa

We want to find out how many unique ID numbers there are and also get a list of names and passwords sorted by user name.

Example

% cat my_file3
user1 4125142 passwd
user3 1415511 f#afk@
user2 9999999 p_2ad(
user4 1415511 m#@!ad
user5 0011292 lkdfaa
% cut -f2 -d" " my_file3
4125142
1415511
9999999
1415511
0011292
% cut -f2 -d" " my_file3 | sort -n
0011292
1415511
1415511
4125142
9999999

Example

% cut -f2 -d" " my_file3 | sort -n | uniq
0011292
1415511
4125142
9999999  
% sort my_file3
user1 4125142 passwd
user2 9999999 p_2ad(
user3 1415511 f#afk@
user4 1415511 m#@!ad
user5 0011292 lkdfaa 
% sort my_file3 | cut -f1,3 -d" "
user1 passwd
user2 p_2ad(
user3 f#afk@
user4 m#@!ad
user5 lkdfaa

Modified: 2017-01-30T09:39

User: Guest

Check: HTML CSS
Edit History Source
Apply to CSU | Contact CSU | Disclaimer | Equal Opportunity
Colorado State University, Fort Collins, CO 80523 USA
© 2015 Colorado State University
CS Building