Archive for March, 2011

Automount USB drives – Ubuntu Server

March 21, 2011

If you try to use Ubuntu as a server, sometimes you come a cropper of relying on the facilities provided by the desktop version. I use USB drives to manage some of my data and give them volume labels so that they’re always mounted in the same place – /media/label. This doesn’t work if you’re not logged in to the GUI.

At first, I got tangled up in usbmount, which is recommended. However, it mounts the drives into usb0, usb1 etc. I couldn’t get it to mount by volume name – it even complained “/dev/sdb does not contain a filesystem or disklabel”.

The solution was to install halevt, which provides the hardware abstraction layer and an event handler to deal with plugging in USB drives.

Advertisements

Gene and Exon Expression Using Affymetrix Mouse Exon Arrays (MoEx)

March 15, 2011

Our lab makes extensive use of Affymetrix expression arrays to measure gene expression. I’ve already processed several different types of mouse and human gene arrays to determine which genes are active or inactive in the experimental sample. The pipeline for this is based on advice from contacts at the Sanger, and is based on the bioconductor Affy package. We use RMA but also MAS5 absent/present calls.

Some of our work requires greater resolution than gene expression – we want a measure of which exons are the most transcribed. This is close to the related problem of determining the specific transcript of a particular gene (which set of exons) is transcribed in the cell type used in the experiment. You can do this with RNA-seq, but it’s quite expensive and the tools aren’t very mature yet. We chose to go with Affymetrix’s mouse exon arrays, or MoEx for short.

However, using R/Bioconductor to derive RMA and MAS5 for MoEx is not as easy as the gene arrays. In R you should be able to do (crudely):


> library("affy")
Loading required package: Biobase

Welcome to Bioconductor

Vignettes contain introductory material. To view, type
'openVignette()'. To cite Bioconductor, see
'citation("Biobase")' and for packages 'citation(pkgname)'.

> rawAffy rmaAffy <- rma(rawAffy)

but for MoEx you get:


Error in getCdfInfo(object) :
Could not obtain CDF environment, problems encountered:
Specified environment does not contain MoEx-1_0-st-v1
Library - package moex10stv1cdf not installed
Bioconductor - moex10stv1cdf not available

For gene arrays, you can usually just use bioclite to install a Bioconductor library that includes the CDF environment, and indeed there is a bioclite package for MoEx, but it doesn’t fix the error message.

I’m sure if I was better at R I could resolve this, but instead I downloaded the CDF manually from the XMap page (more about XMap in a minute) and then do this:


library("affy")
adata <- ReadAffy()
adata@cdfName <- "mouseexonpmcdf"
rma <- rma(adata)

Then you can dump this list of RMA values like this:


write.exprs(rma, "RMA_output.csv", sep="\t")

This produces a file that has RMA values for each MoEx probeset for each biorep we used. Each probeset represent a set of oligo probes which, together, cover a region of the genome to provide information about the exon it overlaps. Now we need to translate probeset IDs to exons. In my case, I want Ensembl exon IDs, since Ensembl is the gold standard annotation for us.

With GeneChip arrays, Ensembl has computed mappings from Affy’s probeset IDs to Ensembl genes, and you can get hold of this information quickly using BioMart. Ensembl also has mappings for MoEx, but these are from MoEx probeset IDs to gene IDs, and not exon IDs.

I tried a variety of methods to derive a specific mapping to exon IDs. I used the coordinates Affy provides for its probes (and thus probesets), which can be obtained under technical documentation (registration required?) to look up the overlapping exons from a chosen Ensembl annotation. I also tried using the sequences of the probes (also available on the Affy web page) and BLAST them against the genome to obtain the coordinates of the probesets myself, and then did the overlapping as before. However, all these methods are rather hacky and I’m by no means an expert with this kind of technology so I can’t be tell if what I’ve done is appropriate or not.

Instead, we took advice and decided to use the Manchester Paterson’s exonmap mapping. Exonmap is the new name for XMap, which I mentioned earlier – they seem to be the MoEx experts. Exonmap is actually a full bioconductor package, that allows you to derive RMA values for specific gene lists and a variety of other functionality. However, as always, we need more fine-grained control over how probesets are combined. So I just extracted the probeset to Ensembl exon ID mapping from the exonmap database tables to use for my stuff.

Downstream, I have custom code to work out expression level for individual exons and genes as well as differential expression between treatment groups. We use an approach based on the splice index method.

However, there are other packaged solutions for doing this. One we tried was AltAnalyze, a Python tool with a nice GUI that allows you to run a variety of types of exon analysis on MoEx and other experiment types. It comes automatically packaged with the Affy power tools (although you can also get those yourself) and a selection of Affy annotation files.

As usual, we needed something we could fiddle with more, so we still ended up with custom code. However, we did use the background-corrected calculation for RMA values from AltAnalyze, which is actually an Affy power tools command:


$HOME/src/AltAnalyze_v1release/AltDatabase/affymetrix/APT/Linux/64bit/apt-probeset-summarize -p $HOME/src/AltAnalyze_v1release/AltDatabase/affymetrix/LibraryFiles/MoEx-1_0-st-v1.r2.pgf -c $HOME/src/AltAnalyze_v1release/AltDatabase/affymetrix/LibraryFiles/MoEx-1_0-st-v1.r2.clf -b $HOME/src/AltAnalyze_v1release/AltDatabase/affymetrix/LibraryFiles/MoEx-1_0-st-v1.r2.antigenomic.bgp --kill-list $HOME/src/AltAnalyze_v1release/AltDatabase/EnsMart55/Mm/exon/Mm_probes_to_remove.txt -a rma-sketch -a dabg --cel-files cel_files.txt -o .

Obviously this has a lot of file dependencies from AltAnalyze, but this gives us not only RMA values, but p-value confidences for those values, which we use downstream. This replaces the R/Bioconductor RMA value generation. I expect there are arguments to be had about which tool generates the best expression values.

I like this Affy power tools approach, because it does not rely on R/Bioconductor and can be easily scripted. We managed to find an ArrayExpress entry containing MoEx bioreps for 54 different mouse tissues. Using the Affy power tools, I wrote a script to generate RMA values and apply some other parts of our pipeline to all 54 groups. We could then compare probeset levels in our experiments to all these tissue types to derive the relative level of expression for exons in our samples – all this took only a matter of days.