Posts Tagged ‘nan’

numpy and NaN again

July 30, 2010

As a continuation of my heatmap work, I’ve been normalising the interaction frequencies against the density of other biological features. In particular, we are more likely to detect interactions around the restriction enzyme we have used to separate the interacting partners after cross-linking. In our case, this is HindIII (which my biological colleagues pronounce “hindi – 3”). Therefore, we’d like to normalise against this density – if we pick up interactions in a region of low hindIII density, we should consider this more significant than in an area of high hindIII density.

First of all, we compute the density of hindIII sites at the same resolution as our heatmaps – 1mb. We then divide each density by the median hindIII density; this gives us the relative density of each 1mb region to the median. Each interaction is between two 1mb regions and is more likely to occur based on the density of the hindIII sites in each of these regions. Therefore, we want to normalise each spot on the heatmap by the product of the relative density of hindIII from the two regions.

We can do this conveniently in numpy, because numpy gives us native matrix operations. Our heatmap has one chromosome on the x axis and another on the y axis. We take the relative hindIII densities for the y axis chromosome and turn this into a column vector by doing:


ypoints.reshape(-1, 1)

Then we can generate a matrix of products with the x axis densities by doing:


normmatrix = xpoints * ypoints.reshape(-1, 1)

To satisfy myself that this produced sensible output, I did this:


>>> from numpy import arange
>>> arange(10).reshape(-1,1) * arange(10)
array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18],
[ 0, 3, 6, 9, 12, 15, 18, 21, 24, 27],
[ 0, 4, 8, 12, 16, 20, 24, 28, 32, 36],
[ 0, 5, 10, 15, 20, 25, 30, 35, 40, 45],
[ 0, 6, 12, 18, 24, 30, 36, 42, 48, 54],
[ 0, 7, 14, 21, 28, 35, 42, 49, 56, 63],
[ 0, 8, 16, 24, 32, 40, 48, 56, 64, 72],
[ 0, 9, 18, 27, 36, 45, 54, 63, 72, 81]])

Then normalise the heatmap matrix by doing:


normalised = matrix / normmatrix

Which is very concise and intuitive. However, the title of this post is about tangling with NaNs again.

At various points, the density of hindIII sites drops to 0. This means that we’ll be dividing by zero for some of the matrix cells. Exactly what numpy does about this is subject to some slightly odd variations, in particular when our heatmap matrix also contains 0 at that position, so we’ll be dividing zero by zero.

In regular Python, here’s what happens when you divide zero by zero:


>>> 0 / 0
Traceback (most recent call last):
File "", line 1, in
ZeroDivisionError: integer division or modulo by zero
>>> 0. / 0
Traceback (most recent call last):
File "", line 1, in
ZeroDivisionError: float division
>>> 0 / 0.
Traceback (most recent call last):
File "", line 1, in
ZeroDivisionError: float division
>>>

Note that in each case, we get a ZeroDivisionError exception, but when either the denominator or numerator is a float, the error message is slightly different. If we do the same inside a numpy array:


>>> from numpy import array
>>> array([0]) / array([0])
array([0])
>>> array([0.]) / array([0])
array([ NaN])
>>> array([0]) / array([0.])
array([ NaN])

This is particularly odd because:


>>> array([2]) / array([0])
array([0])
>>> array([2.]) / array([0])
array([ Inf])
>>> array([2.]) / array([0.])
array([ Inf])
>>> array([0.]) / array([0.])
array([ NaN])

So integers divided by zero in numpy give 0, floating point numbers give infinite and 0 (as a float) gives a NaN. This seems pretty odd to me, but then I though the other nan behaviour was weird too, and that turned out to be based on the standard.

To get sensible heatmaps from this, I have to filter out both infs and nans and replace them with 0s. Filtering inf is easy, using numpy’s selective assignment:


normalised = [normalised==numpy.inf] = 0

but you can’t do this with nan, because numpy.nan != numpy.nan. Fortunately, numpy provides a function to do just what I’m after:


normalised = numpy.nan_to_num(normalised)

Advertisements