# NSCI 580A3 fall 2017

### Sidebar

NSCI 580A3

Instructors
Tai Montgomery
Erin Nishimura

wiki2016_rna_clustering

# Clustering

#### References

Cluster Analysis and display of genome wide expression patterns. Eisen et al., 1998, PNAS.

How does gene expression clustering work? D'haeseleer, P., 2005, Nature Biotechnology.

Hierarchical Clustering With R - Part 1. Stats-Lab Dublin, 1 of a 4 part youtube series.

#### Resources

There are many functions/packages in R that perform clustering.

• heatmap - base R
• heatmap.2 - in the gplots package
• pheatmap - pheatmap package (pretty heatmap)
• Heatmap - ComplexHeatmap package, my current favorite

#### What is the goal?

1. To visually represent complex multi-dimensional dataset for visual appreciation
2. To parse genes into different groups depending on their expression patterns across samples
3. To identify more complex relationships between genes and conditions than can be seen in pair-wise comparisons

#### What are our options?

There are many different ways to cluster genes:

• Hierarchical clustering
• K means clustering
• Self Organizing Maps (SOMs)

Today we'll look at hierarchical clustering.

#### Steps

• Filter the genes – select only changing genes
• This removes genes that are constant across samples
• Center & Scale the reads
• for each gene, set the mean expression across the dataset to 0; set the standard deviation to 1.
• Use t(scale(t(matrix))) prior to calling the clustering function
• OR, look for a 'scale by rows' option within the clustering function
• Why? To focus in on the trend of each gene's expression pattern, not the intensity
• Calculate the distances between each gene and every other gene
• Test how “similar” each gene's pattern of expression is to every other gene.
• Calculate a metric of similarity
• Many options (look into the dist function within R for options):
• “euclidian”
• “canberra”
• “manhattan”
• “correlation” – Pearson or Spearman
• Hierarchically cluster the genes based on their similarities; Also called “linkage methods”.
• Figure out how to group genes together based on similarity
• Many options (look into hclust in R for options)
• complete
• single
• average
• Draw a heatmap
• Colors represent values
• Make a dendrogram on clustered values
• Note… order from top to bottom isn't informative. It's the distance from nodes that is informative