User Tools

Site Tools




Cluster Analysis and display of genome wide expression patterns. Eisen et al., 1998, PNAS.

How does gene expression clustering work? D'haeseleer, P., 2005, Nature Biotechnology.

Hierarchical Clustering With R - Part 1. Stats-Lab Dublin, 1 of a 4 part youtube series.


There are many functions/packages in R that perform clustering.

  • heatmap - base R
  • heatmap.2 - in the gplots package
  • pheatmap - pheatmap package (pretty heatmap)
  • Heatmap - ComplexHeatmap package, my current favorite

What is the goal?

  1. To visually represent complex multi-dimensional dataset for visual appreciation
  2. To parse genes into different groups depending on their expression patterns across samples
  3. To identify more complex relationships between genes and conditions than can be seen in pair-wise comparisons

What are our options?

There are many different ways to cluster genes:

  • Hierarchical clustering
  • K means clustering
  • Self Organizing Maps (SOMs)

Today we'll look at hierarchical clustering.


  • Filter the genes – select only changing genes
    • This removes genes that are constant across samples
  • Center & Scale the reads
    • for each gene, set the mean expression across the dataset to 0; set the standard deviation to 1.
    • Use t(scale(t(matrix))) prior to calling the clustering function
    • OR, look for a 'scale by rows' option within the clustering function
    • Why? To focus in on the trend of each gene's expression pattern, not the intensity
  • Calculate the distances between each gene and every other gene
    • Test how “similar” each gene's pattern of expression is to every other gene.
    • Calculate a metric of similarity
    • Many options (look into the dist function within R for options):
      • “euclidian”
      • “canberra”
      • “manhattan”
      • “correlation” – Pearson or Spearman
  • Hierarchically cluster the genes based on their similarities; Also called “linkage methods”.
    • Figure out how to group genes together based on similarity
    • Many options (look into hclust in R for options)
      • complete
      • single
      • average
  • Draw a heatmap
    • Colors represent values
    • Make a dendrogram on clustered values
    • Note… order from top to bottom isn't informative. It's the distance from nodes that is informative
    • Think about labels
    • Try to use color-blind-friendly color schemes instead of originally popular red-green.

Clustering Demo

wiki2016_rna_clustering.txt · Last modified: 2017/12/05 09:35 by erin