User Tools

Site Tools


2016pipes2

MORE PIPES

Now that we know what piping is, we can discover some new functionalities of Linux. Let's learn how to pipe the following commands:

sort - sort lines in a file
uniq - find unique (or duplicated) lines in a pre-sorted file
tee - redirect stdout or stderr to multiple locations

:!: Exercise: Let's make a test file. Copy and paste the text below into a file called mini.gff

# A tester gff file.								
# For testing pipes.								
chrV	test	CDS	789	809	.	+	.	annotation info
chrII	test	CDS	24558	26798	.	+	.	annotation info
chrV	test	CDS	789	809	.	+	.	annotation info
chrI	test	CDS	233	236	.	+	.	annotation info
chrIV	test	CDS	1234	7654	.	-	.	annotation info
chrI	test	CDS	233	236	.	+	.	annotation info
chrII	test	CDS	24558	26798	.	+	.	annotation info
CHRI	test	CDS	11565	11951	.	+	.	annotation info
chrII	test	CDS	24558	26798	.	+	.	annotation info
chrIII	test	CDS	13678	137888	.	+	.	annotation info
CHRII	test	CDS	7997	8547	.	+	.	annotation info
chrIII	test	CDS	13678	137888	.	+	.	annotation info
chrIV	test	CDS	1234	7654	.	-	.	annotation info
chrV	test	CDS	13363	13743	.	+	.	annotation info
chrIV	test	CDS	1234	7654	.	-	.	annotation info
chrIV	test	CDS	1234	7654	.	-	.	annotation info
chrV	test	CDS	789	809	.	+	.	annotation info

Sorting files by line using sort

We can use sort to sort a file's lines into a new order…

sort usage:
sort [options] <file.txt> …

:!: Exercise: Sort the mini.gff file:

$sort mini.gff

:!: Exercise: Read the sort man pages to figure out how you would…

  • sort in reverse order
  • sort the capital and lower case letters together
  • sort in numerical order.
  • Try some of these options

Find unique lines using uniq

We can identify unique (or duplicated) lines in a pre-sorted file using the command uniq.

uniq usage:
uniq [options] <sortedFile.txt>

To operate on a presorted file, we have two options. We can do the process in two steps:

  1. sort file.txt > sortedFile.txt
  2. uniq sortedFile.txt

OR, we can use the pipe operator to chain the two commands together:

$sort mini.gff | uniq

;-) Quick tip: To find the duplicated lines, use -d as an option for uniq.

:!: Common pitfall: Pipes are fun, but pipes can be problematic with large files. Depending on your computer or cluster, there may be a limit to how much information can be piped to a new command. In these cases, creating a temp file (sometimes written as file.tmp) is preferable.


Redirect to multiple locations using tee

In an earlier class, we learned how to redirect STDOUT and STDIN to a file. If we want to direct STDOUT to both a file and the screen, we can use the tee command. tee is used with the pipe command.

tee usage:
command | tee <filename.txt>

:!: Exercise: Try to send output from a command to both the screen and a file.

$wc mini.gff | tee wc_output.txt

;-) Quick tip: tee is really used for redirecting stdout. If you want to redirect stdout and stderr, this command works, but I have no idea why:

$wc mini.gff skdjfldj 2>&1 | tee wc_stdoutstderr.txt

:!: Exercise: Can you write a series of pipes that will determine how many unique chromosomes are represented in mini.gff?

Further Exercises2

2016pipes2.txt · Last modified: 2017/08/29 08:47 by erin