User Tools

Site Tools


2016pipes2

MORE PIPES

We learned a little bit about pipes last week. Now that we know what piping is, we can discover some new functionalities of Linux. Today, we'll talk about how to pipe the following commands:

sort - sort lines in a file
uniq - find unique (or duplicated) lines in a pre-sorted file
tee - redirect stdout or stderr to multiple locations

:!: Exercise: Let's make a test file. Copy and paste the text below into a file called mini.gff

# A tester gff file.								
# For testing pipes.								
chrV	test	CDS	789	809	.	+	.	annotation info
chrII	test	CDS	24558	26798	.	+	.	annotation info
chrV	test	CDS	789	809	.	+	.	annotation info
chrI	test	CDS	233	236	.	+	.	annotation info
chrIV	test	CDS	1234	7654	.	-	.	annotation info
chrI	test	CDS	233	236	.	+	.	annotation info
chrII	test	CDS	24558	26798	.	+	.	annotation info
CHRI	test	CDS	11565	11951	.	+	.	annotation info
chrII	test	CDS	24558	26798	.	+	.	annotation info
chrIII	test	CDS	13678	137888	.	+	.	annotation info
CHRII	test	CDS	7997	8547	.	+	.	annotation info
chrIII	test	CDS	13678	137888	.	+	.	annotation info
chrIV	test	CDS	1234	7654	.	-	.	annotation info
chrV	test	CDS	13363	13743	.	+	.	annotation info
chrIV	test	CDS	1234	7654	.	-	.	annotation info
chrIV	test	CDS	1234	7654	.	-	.	annotation info
chrV	test	CDS	789	809	.	+	.	annotation info

Sorting files by line using sort

We can use sort to sort a file's lines into a new order…

sort usage:
sort [options] <file.txt> …

:!: Exercise: Sort the mini.gff file:

$sort mini.gff

:!: Exercise: Read the sort man pages to figure out how you would…

  • sort in reverse order
  • sort the capital and lower case letters together
  • sort in numerical order.
  • Try some of these options

Find unique lines using uniq

We can identify unique (or duplicated) lines in a pre-sorted file using the command uniq.

uniq usage:
uniq [options] <sortedFile.txt>

To operate on a presorted file, we have two options. We can do the process in two steps:

  1. sort file.txt > sortedFile.txt
  2. uniq sortedFile.txt

OR, we can use the pipe operator to chain the two commands together:

$sort mini.gff | uniq

;-) Quick tip: To find the duplicated lines, use -d as an option for uniq.

:!: Common pitfall: Pipes are fun, but pipes can be problematic with large files. Depending on your computer or cluster, there may be a limit to how much information can be piped to a new command. In these cases, creating a temp file (sometimes written as file.tmp) is preferable.


Redirect to multiple locations using tee

In an earlier class, we learned how to redirect STDOUT and STDIN to a file. If we want to direct STDOUT to both a file and the screen, we can use the tee command. tee is used with the pipe command.

tee usage:
command | tee <filename.txt>

:!: Exercise: Try to send output from a command to both the screen and a file.

$wc mini.gff | tee wc_output.txt

;-) Quick tip: tee is really used for redirecting stdout. If you want to redirect stdout and stderr, this command works, but I have no idea why:

$wc mini.gff skdjfldj 2>&1 | tee wc_stdoutstderr.txt

:!: Exercise: Can you write a series of pipes that will determine how many unique chromosomes are represented in mini.gff?

Permissions

2016pipes2.txt · Last modified: 2016/09/01 09:07 by erin