Chromosome Interactions, Heatmaps and matplotlib

I’m currently working on some chromosome conformation data based on 3C and related technologies. These tools allow you to look at the physical structure of the nucleus and determine how chromosomes are positioned in 3 dimensions, and where regions of one chromosome physically touch regions of another. These touching regions affect the regulation of genes in those regions.

The earlier iterations of these techniques are based on looking at interactions from a particular region or locus, to study the interaction with a specific set of genes. Later techniques allow a more flexible whole-genome approach. We are working with Hi-C data, which combines 3C/4C methods with paired-end sequencing, so that each end of the pair is at a different point on the genome.

After processing, this data is a neat list of interacting pairs, with each end as a chromosome coordinate. Our data produces between 2 and 10 million pairs. Most pairs are intra-chromosome (also called “cis” interactions) but around 5% are inter-chromosome (“trans” interactions).

We now need a method to visualise this data. The prevalent method is the heatmap. The interaction points are grouped together into chromosomes, or regions of 100kb or 1mb (we could think of this as quantization). Interacting regions are marked out in a 2-dimensional matrix and then drawn as a heatmap.

For chromosome-chromosome heatmaps we might also normalise each spot based on the number of expected interactions. This is based on the length of the chromosomes – longer chromosomes should give more interactions on average. We also tend to remove cis-interactions – by reducing them to 0 – since otherwise they dominate and we can’t see the relative strengths of the trans-interactions.

Given the pairs of interactions, it’s easy enough to build the matrix that represents the interactions. Rendering this is matplotlib provides quite a few options. I’ve generated some sample data and showed a few options below, using the following code:

import pylab

chromnames = ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chr22', 'chrX', 'chrY']
values = pylab.zeros((24, 24))

values[2][6] = 3
values[6][2] = 3
values[23][1] = 13
values[1][23] = 13
values[5][2] = 5
values[2][5] = 5
values[12][14] = 11
values[14][12] = 11
values[19][9] = 6
values[9][19] = 6


f = pylab.Figure()

#pylab.pcolor(values)
#imagename = "pcolor.png"

# pylab.imshow(values)
# imagename = "imshow.png"

pylab.matshow(values)
imagename = "matshow.png"


pylab.xticks(range(len(chromnames)), chromnames, rotation=90)
pylab.yticks(range(len(chromnames)), chromnames)

pylab.savefig(imagename)

My first stab was to use imshow(), but this bleeds each coloured blob into the surroundings, which looks nice but makes it hard to read the individual blobs:

imshow()

There is also a pcolor() function, but this plots from bottom to top (as you might expect) which is not ideal for what we want here. I’d have to turn the data and y axis labelling upside down, which is a little awkward:

pcolor()

I finally plumped for matshow() (which stands for “matrix show”), which does things in the right order with solid blobs:

matshow()

However it’s clearly not without its problems. The xtick labels have not rotated, so I had to do it manually, like this:

im = pylab.matshow(values)
for label in im.axes.xaxis.get_ticklabels():
	label.set_rotation(90)

You might also be able to see that the top and bottom rows are half-height – I had to correct this by adding dummy row labels here. I don’t know why this happens.

There are various other problems with the other plots too. I think it would be safe to say that matplotlib is a powerful tool but if you want plots in a particular way, it can take a lot of documentation bashing to get there. As always, I usually feel that the way I want it should be the standard and it should do that in the first place, but I can never tell how unreasonable I am being when I say this.

An open question here is whether heatmaps are actually the best way to visualise this data. I’m also working on a student project at the moment where we use a graph-based visualisation instead. My theory is that we can use a spring embedder algorithm to automatically lay out the graph of interactions between a pair of chromosomes and show their conformation. I think the usual wisdom here is that no visualisation is “right” – best to have several at your disposal to allow you to look at the data in a variety of different ways.

Advertisements

4 Responses to “Chromosome Interactions, Heatmaps and matplotlib”

  1. John Hunter Says:

    imshow has an interpolation argument that defaults to “bilinear”, but you can set it to “nearest” so it doesn’t do averaging and look “blurry”. Your default interpolation can be set using the “image.interpolation” parameter in your rc file (see http://matplotlib.sourceforge.net/users/customizing.html) so in this case you can make the default behavior work the way you like.

  2. Patricia Says:

    Hi,

    I’m also trying to draw a heatmap with matshow but when I add the ticks, the figure decrease size and my colorbar stay bigger than the image.

    I only started using matplotlib like a month ago and this is not obvious to me.
    And your figure doesn’t seems to have problems. If you have any idea why this is happening I would appreciate some feedback.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: